Scratch that. I've got an even better design than what I was thinking of above. It makes it so the service provider never has access to the unencrypted data, and they can fully de-dup immediately, and it supports all Dropbox features.
Let F be an arbitrary file.
Let N(F) be the name your client knows the file by.
Let H(F) be a hash of the file that produces a 256 bit hash.
Let AES(X,K) be X encrypted using AES with key K.
When you upload to the cloud, you upload AES(F,H(F)). In a local database, you store (N(F), H(F)). When you later retrieve the file from the cloud, you receive the encrypted data, and you can lookup the key, H(F), in your local database.
Note that if two different upload files with the same content, they pick the same encryption key (since the key comes from a hash of the content), and so the same data gets uploaded. The service can thus do de-duplication, even though it has no access to unencrypted data.
So far, all this provides is secure storage. What makes Dropbox useful is that a file uploaded on one computer can be downloaded on another, and that only works if the downloader knows H(F).
This is solved by also uploading a copy of that local database I mentioned, the one that stores the (N(F), H(F)) pairs. This can be encrypted with the account password.
Syncing between different devices on the same account is then a two step process. First, the name/key database is synced, and then both devices have access to the keys and then the files can be synced.
I believe web access can be handled via this system. Dropbox's web interface requires Javascript, so it could have the browser retrieve the name/key database and decrypt it using the account password, which gives it the access to the key to decrypt a given file.
For shared folders, you can use a public key system, where the keys for the shared files are encrypted with the public keys of each person you are sharing the folder with, and the encrypted key files are stored in the cloud. Anyone accessing the shared folder grabs the key file for the folder and uses their private key (which is protected by the account password) to get K(F) for the file.
I believe this covers everything Dropbox does, with the properties that:
1. They can't decrypt your files.
2. They can de-duplicate completely.
3. Your account password is the key for everything for you.
4. It satisfies all of their advertising claims for security.
1. It is usually not a good idea to use your key as a function of the message. Here, you would require your hash function to have a good min-entropy given inputs from whatever distribution M comes from. I believe SHA-256, as of today, will satisfy these needs.
2. Even if H is modeled as a perfect hash function (i.e., a random oracle, in crypto literature) you would require that AES itself does not use H in any particular special way. Think of AES' which is just like AES except when k=h(m) for any message, it just outputs k (i.e., cheats). It would be nearly impossible to detect this behaviour of AES' under normal circumstances because H is pre-image resistant, but clearly, this would trivially void the security of the scheme.
The suggestion is exactly what came to my mind, but the proof, although should most definitely hold when instantiated with AES and SHA-256 will require some work to be proven in general.
There's still a big problem with de-duplication: Dropbox can still figure out which users have the same file, thus leaking information. That, combined with the fact that they'll know the size of the file already gives them a lot of info.
For example, if the FBI seizes a computer and finds some illegal files, they can still request Dropbox to give a list of users that have the same file.
As has been mentioned elsewhere in the thread -- de-dupe isn't responsible.
If Dropbox is storing your files -- then the TLA can always request Dropbox to give a list of users that have the same file. (Unless you have some form of independant crypto/hashing)
Note that if two different upload files with the same content, they pick the same encryption key (since the key comes from a hash of the content), and so the same data gets uploaded. The service can thus do de-duplication, even though it has no access to unencrypted data.
So far, all this provides is secure storage. What makes Dropbox useful is that a file uploaded on one computer can be downloaded on another, and that only works if the downloader knows H(F).
This is solved by also uploading a copy of that local database I mentioned, the one that stores the (N(F), H(F)) pairs. This can be encrypted with the account password.
Syncing between different devices on the same account is then a two step process. First, the name/key database is synced, and then both devices have access to the keys and then the files can be synced.
I believe web access can be handled via this system. Dropbox's web interface requires Javascript, so it could have the browser retrieve the name/key database and decrypt it using the account password, which gives it the access to the key to decrypt a given file.
For shared folders, you can use a public key system, where the keys for the shared files are encrypted with the public keys of each person you are sharing the folder with, and the encrypted key files are stored in the cloud. Anyone accessing the shared folder grabs the key file for the folder and uses their private key (which is protected by the account password) to get K(F) for the file.
I believe this covers everything Dropbox does, with the properties that:
1. They can't decrypt your files.
2. They can de-duplicate completely.
3. Your account password is the key for everything for you.
4. It satisfies all of their advertising claims for security.