OK, now that I've thought about it, the easiest way around this probably is throwing money at hardware (more disks) or optimizing the processing. However, this is only in the case of a true disk I/O bottleneck. If you're optimizing correctly then the disks should be reading 8GB blocks directly into memory and the CPU should be spitting them right out again. At the very minimum you should be using an optimized file system with large pages enabled in your kernel.
I don't think I have enough low-level knowledge to answer that intelligently at the moment. What I can say is that we're hitting these problems despite being on Isilon drives. (I think that is orthogonal to your solution, but again am not all too familiar with the subject.)
Not really... these datasets start to saturate 10Gb network connections very easily, so it's a question of volume. With some instruments, you can generate 10-20 terabytes at a time.