Would it even be possible to exfiltrate all that data? It's hard to imagine even storing all that they're generating in a single day. What's the most an attacker could likely get? Usernames & passwords for all users? Complete profile/album/comment data on a few tens of millions?
Very insightful question. The only way I could see this working is having a botnet network that extracts the data externally, encrypts and stores locally, and then each node serves up its cache over bittorrent to a command and control system.
It's just so much data. The more valuable data, arguably, would be plain text profile data about people, not their photos.
Assuming 7 billion people in the world (very conservative, for funsies), each profile containing 10 megabytes of profile data (exclude photos, just textual data), uncompressed, would be 70 petabytes. A lot of data to be sure, but not unsurmountable. Compressed, you could probably get down to 30-40 PB depending on compressibility.
10 megabytes of text data per user? You are orders of magnitude off. A list of friends, all posts and private messages. After compressing it will probably average 10 kilobytes per user.
Remember that thing when a guy from Europe asked from Facebook to give him everything they knew about him (which European citizens are allowed to request by law) and got over 1000 pages of information from Facebook?
Now, multiply that by a billion (well, two or three billions actually).
I only know about the Netherlands, but European might be similar:
- When an organization collects data about you, they are forced to tell you about it (unless they're police or something I guess) and tell you what they are going to do with it. In Dutch this is called the "Informatieplicht persoonsgegevens".
- Upon reasonable request, an organization must give you all information they can reasonably give you. "Reasonably" means, given a normal amount of effort. If they need to contact the garbage collectors and dig up an old cassette they threw away years ago, that is unreasonable to ask. They may also charge a fee, but the limit is quite low I think.
- You can ask an organization to correct or remove your personally identifiable information if you have a good reason or if they have no reason not to. For example if you ask to remove your IP address from logs and they want to keep it for security purposes for 4 weeks, their argument sounds pretty reasonable (unless you have some better reason).
- They cannot keep personally identifiable information for longer than necessary. For example, log files from the web server may be kept for security purposes, but if you have ten year old log files, that is too long to be reasonable for that purpose and is thus illegal.
One thing I've always wondered about is how applicable this is to foreign organizations with websites accessible from the Netherlands. I've heard some people say that for companies with customers in the Netherlands, Dutch law applies and those customers have the above rights. Nobody seems to comply with that, though. Another thing I've heard is that if they have an office here, that office can be held to our laws. I should look that up some time.
I remember the case in which one guy (I think he was Irish) did send that request to Facebook and got a over one thousand page long response.
Then, there was a shit ton of persons from Europe (like, dozens of thousands) who tried to do the same through a combined court order in Vienna. Then, the court in Vienna decided that this issue was "not admissible on procedural grounds". Then, those thousands of people got generic responses like: "You can download every data we store on you by going into your settings...". Then there was an appeal to that decision and then the whole thing died out.
They keep a lot of data in cold storage as well, so not sure how easy it would be able to get everything. Probably just whatever data they think will be likely accessed soon and is in relatively ephemeral storage.