How Dropbox is printing money

rst · on Jan 25, 2011

The reference to rsync in Dropbox's YCombinator application is a bit of a tipoff --- rsync uses exactly this technique to avoid recopying files that already exist at the destination.

marcamillion · on Jan 25, 2011

Yep...I never dived much into that reason - but that's why I quoted it and quoted the definition of 'Diff'. I guess I never brought it home as coherently as I wanted to.

jrnkntl · on Jan 25, 2011

I thought this was common knowledge. It wouldn't make sense to store Pirated.Movie.DVDRiP.avi thousands of times if it's the same file for thousands of users. Files that translate to the same hash, get served from one single file on DB's servers.

marcamillion · on Jan 25, 2011

Well perhaps the technology was common knowledge, but the business model behind it - I don't think was ever popularly discussed.

It must have been overlooked.

If it is true that this is what they are doing, then Dropbox has literally discovered one of the most profitable business models ever!

jwhitlark · on Jan 25, 2011

It's a pretty common optimization. They had a good plan, and executed well, but let's not get carried away; other people were working on similar ideas. I'd bet that data stored in s3 are deduplicated in a similar manner.

Another plus to their setup is the hash for the file is calculated on your machine, so you pay them so you can calculate the hash on your own files and only upload them if they haven't seen them before.

marcamillion · on Jan 25, 2011

I am sure other people were working on similar things, but how many of them were able to execute on them as well as Dropbox has?

Google has supposedly implemented something like that for Gmail Attachments, but Dropbox seems to have the complete package.

dave1619 · on Jan 25, 2011

Is this what Dropbox is doing? Storing just one copy of each unique file? It would make sense for large files.

marcamillion · on Jan 25, 2011

Well...I am not 100% sure, but that's the only logical thing given that if you tried that example I gave you with the 241MB file, it uploaded in a minute or two.

This is just me putting 2 and 2 together.

If they aren't, they should be :)

jolan · on Jan 25, 2011

I am 100% sure. Pirated MP3s "upload" in seconds and my self-ripped music takes the regular amount of time.

marcamillion · on Jan 25, 2011

Agreed! But didn't want to include that in the post, lest the RIAA and MPAA start breathing down their throats.

taylorbuley · on Jan 25, 2011

To me the post seems to state the facts with such certitude that you might want to put that disclaimer a bit higher.

marcamillion · on Jan 25, 2011

Very true. Done!

jwhitlark · on Jan 25, 2011

Well, not a single copy. They would want some sort of redundancy, even if the are just reselling aws services.

Semiapies · on Jan 25, 2011

I seem to recall that this is how Gmail handles attachments.

marcamillion · on Jan 25, 2011

I know...the main differences though are that Gmail doesn't charge per attachment. Whereas Dropbox does.

Semiapies · on Jan 25, 2011

...And?

What does that sort of entitled whining have to do with the storage methodology?

Also, neither Dropbox nor Gmail have ever charged me a cent.

jolan · on Jan 25, 2011

Unfortunatley, they also waste money on bandwidth. Here's the scenario:

1) Copy file into Dropbox on Server1

2) Client1 & Client2 on the same LAN will start downloading the file from the Dropbox servers at the same time; wasting their bandwidth and mine.

I wish the LAN sync feature would be expanded for downloads.

marcamillion · on Jan 25, 2011

Well..considering the alternative, that's the least. The alternative being that they store every file X number of times, where X is the number of users that upload that file.

Wasting money on bandwidth by doing multiple transfers is minimal in comparison.