One of the points not mentioned in the article is that this analysis is useful even if (like me, tptacek, and many others) you don't trust Javascript crypto.
That's because Mega is designed as simply an API that any client (including one distributed as a desktop application) could implement. However, another client implementing this same spec would have the same weaknesses. It's important that we get the design improved now. Then we can go on to build open source clients outside of the perils of the JS runtime.
Edited to add: If anyone is interested in building such a open source desktop client in Python, I could help by making our (SpiderOak's) toolchains, packagers, and installers available.
Edited again to add: Assuming a solid design and implementation is eventually produced in Javascript, you could probably re-use most of the code and avoid the JS runtime dangers by repackaging it as a desktop app with something like http://appjs.org/ Remember that of course your privacy is based not just on you using a secure client, but everyone you share data with doing likewise.
Paying customer here, thank you for a wonderful product and service!
One question I have is why SpiderOak client is not open-source? As I understand, it is Python and decompiling it for analysis is not too much trouble, but still...
Thanks for your interest in SpiderOak! Indeed we will open source the SpiderOak desktop client one day. We've been saying that for a long time, but we're very close. One of the reasons we haven't done so previously is how open source products often get spammed with copycat repackagers, like the VLC guy was explaining the other day.
I should also note that the crypto code in SpiderOak is in one specific module, and we have repeatedly paid for careful audits of that code.
Lastly, we have a new crypto product in the works which we will announce at RSA in February, which is 100% open source, and of particular interest to developers in addition to SpiderOak customers. Please stay tuned!
Release early, release often works pretty well when the whole world is waiting for you to make a mistake. They are certainly getting a lot of valuable advice for free.
It would have been better for their users, however, if they'd tried to get this right before releasing. Mega smells very much like marketing over technology.
I can't help but look at 'Mega' and get the feeling that the 'Secure and Private' part is 99% Kim trying to protect himself from further lawsuits, rather then actually ensuring the privacy and security of user data. I hope I'm wrong of course.
Does it matter? In this case the alterior motive (Kim protecting himself) happens to coincide with privacy, becuase if your information was not private than Mega could know it, which defeats the entire point of Mega being protected by not knowing it.
Yeah, looking at the news I don't agree with the people claiming Mega is trying to protect itself above users. It just seems genuinely flawed. I don't know what good the service would be without working privacy for the users.
I would agree, if it wasn't for the fact that for him to start a 'somewhat edgy storage service' guarantees him publicity, investors and the love of 10s of millions of Megaupload users (and probably a few nice millions of $$ no matter the outcome)
Easier if you're starting from zero, but he already knows this way and a lot of his ego must be wrapped into this corner of the digital world. I don't believe he made an entirely rational cost/benefits decision with this.
A pretty decent read, the code excerpts were pretty eye opening in context. I wonder about the mentioned follow ups, especially the convergent encryption server side. Dotcom's strategy just doesn't seem fixable with some of the flaws in design. I'm new to the concept of js crypto, so it's interesting to see how much has already been considered.
> A "salt" is a small random value used to vary KDF output with the same password input. No salt means that any two users with the same passphrase will have the same KDF output.
Where would you get this salt from on the client side? That salt has to be the same every time a user tries to access their data from a different computer, otherwise you wouldn't be able to decrypt your files with the derived key right? So how is this salt magically transferred from one computer to another? The alternative is that you get the salt from the server along with the encrypted file, but then I don't see how that improves security. Or perhaps what I'm saying here is pure nonsense, in that case I'd very much appreciate a clarification.
Great question. In the recommendations, we're suggesting using two salts. One for deriving the key, and one just for challenges. Both of them are stored plaintext on the server. The salt itself does not need to be kept secret -- it's purpose is just to mitigate pre-computation attacks on the password and the resulting derived key.
Both salts are created by the client from random data at the time the account is created, and saved to the server.
So when a user wants to authenticate, you can just give them the plaintext of the challenge salt along with the issued challenge. There's no harm since it's just random data and shouldn't help them to calculate an answer to the challenge if they don't know the pass phrase. (Note that you only give them the challenge salt, not both salts.)
If you want the server to avoid disclosing the existence/non-existence of an account with a particular name (wise!) then just give them a made up random string as a challenge for non-existent accounts. Cache it somewhere so you can be consistent. Always rate limit login attempts.
It does so far, and I see how this does no harm (as you say), but I also don't see how it improves security? Can you give an attack that would work against the unsalted system which wouldn't work against the salted system?
Without a salt one can pre-compute all the KDF (input, output) pairs and store them in a dictionary (essentially). This single dictionary now acts as a kind of master key.
By including a random per-user salt, even assuming the attacker has access to the salt, the output of the KDF now varies on both a per-user and per-password basis.
Two users with the same password have different keys under this design, so pre-computing something that works for all users becomes infeasible.
If the KDF is weak in other ways, e.g., it is so fast that one doesn't need to pre-compute anything, that's a separate issue.
1. You have a whole bunch of encrypted files from different users.
2. The KDF is slow compared to checking if decryption with a given output of the KDF is successful.
With a salt this will take time nm(T + Q) where n is the number of files, m is the number of different passwords and T is the time it takes to execute the KDF and Q is the time it takes to try to decrypt with a derived key. Without a salt you can amortize the KDF executions for a total time of mT + nmQ.
In particular it doesn't help you decrypt one specific file, and the total time is still on the order of n*m, so unless Q is very cheap compared to T this won't help much (obviously still a good idea to do the salting though).
I think you're confusing encryption and KDFs. Encryption is meant to be reversible, KDFs are not.
In this context, a KDF is used for verifying that a message is valid. I have a derived key, you give me a key, and I verify that key by running it through the KDF. If KDF(your_key) == derived_key then we have "proven" that your_key is valid.
Another way of thinking about it is that the image of a KDF should map low-entropy input to high-entropy output. Depending on what we're using the KDF for, we might also want it to have other properties, e.g., intentionally slow, intentionally memory intensive, etc.
This is different than you encrypting something, sending me the encrypted message, and me decrypting it using a shared secret.
I don't think I said anything that disagrees with this? In this case the output of the KDF is used as a key to decrypt a further key, with is in turn used to decrypt your file. So an attacker can't just do a simple check to see that he guessed the right password, he actually needs to decrypt the file and see if garbage comes out or an actual decrypted file.
The key derivation function's purpose is to take a long time (CPU time) to compute an output key given a password input. The combination of dangers here is that the KDF is too fast (doesn't take enough CPU) AND the output is the same per user.
What we're trying to protect against is someone in possession of the data (such as Mega themselves, or anyone working for them, or hacking them, or seizing their assets, or just someone over the internet making an offline attempt against an account after discovering the auth hash) from making a brute force attempt to learn the plaintext content of the accounts.
So to help me crack accounts, I could make a database of outputs from the KDF, with every password combination pre-computed. "Start with "aaaaaa" as a password, and go all the way through "ZZZZZZ" out to some number of characters. The database would have 32 output bytes per entry. The first 32 bytes would be the output of "aaaaaa" through the kdf, then "aaaaab", and so on.
If my math is right, going out to 6 characters using letters numbers and punctuation, the database (probably just a file really) would be about 5 TB in length, and based on my single slow CPU's AES speed, would take about 99 CPU days to compute. Of course if you have 100 CPUs (or EC2, or a botnet) you could have it done in time for dinner tomorrow. Or just wait because someone will likely precompute one you can bittorrent in the next few days.
So, if I wanted to make a brute force attack against Mega's database of users, I would use this pre-computed database, reading it sequentially from disk, taking every 32 byte section and trying it as a key to decrypt the next layer of keys for a user. My CPU could likely check answers faster than the data would stream in from the disk.
So I would be able to brute force any combination out to 6 password characters in a few minutes using only the spare hardware sitting around my office and unoptimized code. Imagine what NSA could do.
So, finally getting back to answering your question. If the salt were used, then I couldn't pre-compute the KDF output. I would have to do it separately for each user. So then my attempts would (again on my weak CPU) proceed at only about 520 attempts per second. That's still about 500 times to fast. So, the design needs both a slower KDF, and salt. :)
Trivia: Want to time your own CPU and how many AES cycles it can do?
$ openssl speed -evp aes-128-ecb
Doing aes-128-ecb for 3s on 16 size blocks: 102358287 aes-128-ecb's in 3.00s
>>> 102358287 / 3 / 2**16
520
Well it kinda does make sense but I don't understand why you would go through this trouble with sending salts around. If the goal is what you describe couldn't you just derive a salt from the username at the client side? (I was assuming that they were already doing this, but apparently they are not?)
Let's say everyone did as you propose and use usernames for salts. I could pre-compute dictionaries for usernames like root, admin, etc. and likely gain full access to a wide range of systems.
Our aim is to build a system where brute-forcing is always the cheapest option. Once that's the case, we're in a position to control "how expensive" it is for someone to invert even one derived key.
That would at most reduce the number of KDF calls an attacker would need to perform by a factor of 2, if (1) the other service is using exactly the same KDF (2) all the usernames are identical in the two services. If not all the usernames are identical, then the factor of 2 is even lower. If an attacker can perform n calls to a KDF then he can also perform 2n calls to it, so if you're relying on that factor of 2 for security you have a problem no matter what. In any case this is easily defeated by adding a random string: KDF(username + "|blablafoo|" + password).
Yes, this would be true in the case of adding a random string as a second salt but then this would have to be sent to the client also.
I am not sufficiently familiar with KDFs but I know that rainbow tables exist to crack straight up md5 hashed passwords for common usernames and passwords where the username has been used as the salt.
Maybe harder to do with a KDF because it's prohibitive to compute that many keys in the first place?
In a unsalted system, let's say your password is "hello" and my password is "hello." Your hash("hello") will be same as my hash("hello"). When I see your hash is same as mine, I know your password is "hello."
In a salted system, your hash("hello" + salt1) != my hash("hello" + salt2), and I have no way of guessing your password since the hashes are different.
In this case, I believe the [server provided] salt would be used solely as a precaution against rainbow tables, but there may be additional benefits of which I am unaware.
A small point, but the title here might bear changing. "Crypto analysis" suggests a typo of "cryptanalysis", which is not what the article is about. Perhaps "Analysis and Recommendations for the Crypto in Mega" might be a better title.
I wish we could too, but instead we rely on the time honored approach of charging our customers directly for our services, rather than trying to monetize with advertising or selling your data. We do offer 2GB for free for life though. Thanks for your interest regardless!
I used spideroak for a year (paid) and slowly grew to hate the client. Everthing else about the service was great and I had a large amount of trust in the company, but the client was extremely slow and unpredictable.
I wish Dropbox and SpiderOak would mate and have a child product that was as smooth, fast and easy to use as Dropbox, but with full SpiderOak grade client side encryption.
The most interesting part of this whole Mega Crypto drama is if Mega would in fact straighten their implementation, or if it was perhaps made bruteforce-able by design.
All this hate for Kim Dotcom aside, he didn't literally write it, some developer did, and if that developer is anything like me, he's going to want to take care of his baby, and these recommendations are going to immensely help him do that.
What would Dotcom gain from having a weak crypto system. Plausible deniabilty will work up to a point, but the government can still subpoena his encrypted data, crack it themselves, then require Dotcom to comply. And if Dotcom actually acted (or even accessed) the data, then he would through plausible deniabilty out of the window by demonstrating that he does have access to the data, in which case the entire exersise in creating the cryptosystem is pointless.
That's because Mega is designed as simply an API that any client (including one distributed as a desktop application) could implement. However, another client implementing this same spec would have the same weaknesses. It's important that we get the design improved now. Then we can go on to build open source clients outside of the perils of the JS runtime.
Edited to add: If anyone is interested in building such a open source desktop client in Python, I could help by making our (SpiderOak's) toolchains, packagers, and installers available.
Edited again to add: Assuming a solid design and implementation is eventually produced in Javascript, you could probably re-use most of the code and avoid the JS runtime dangers by repackaging it as a desktop app with something like http://appjs.org/ Remember that of course your privacy is based not just on you using a secure client, but everyone you share data with doing likewise.