Storing it really is the trivial and cheap part. There are other really hard problems to solve:
(1) Where do you place collection points so you get a full take of not only international traffic (moving across your borders) and domestic traffic (all traffic within your borders but that doesn't leave your borders)
(2) Since the number of collection points is limited, that means there is a lot of data that has to be recorded at select points. How do you record that data to disk in real time?
(3) How do you avoid the duplication of packets that travel through multiple collection points.
(4) Lastly, the most difficult problem is figuring out how to query all that data and not end up with a haystack. When you have millions and millions of pieces of communication from people with no involvement in the criminal activity, then all that communication becomes noise.
1) If it's your own country's traffic, at the choke point in the network, which you (ie. govt) have full control over, since they are domestic.
2) Since you already have 100,000 disks, run 5000 of them in parallel. Each disk interface does 200MB/s, for an aggregate of 1TB/s. If necessary, use fibre to transport data from the collection point to the storage point.
3) Don't. Record the lot.
4) Spend a billion on a super computer?
I think it comes down to the bandwidth of computers now vastly exceeding the bandwidth of human thought. It is now possible to record an person's entire life, with plenty of headroom to spare, to account for any attempt to overwhelm the recording system.
(1) Where do you place collection points so you get a full take of not only international traffic (moving across your borders) and domestic traffic (all traffic within your borders but that doesn't leave your borders) (2) Since the number of collection points is limited, that means there is a lot of data that has to be recorded at select points. How do you record that data to disk in real time? (3) How do you avoid the duplication of packets that travel through multiple collection points. (4) Lastly, the most difficult problem is figuring out how to query all that data and not end up with a haystack. When you have millions and millions of pieces of communication from people with no involvement in the criminal activity, then all that communication becomes noise.