Bottleneck depends on the workload. If you're training a small/fast network, data bandwidth is a real problem.
That being said, for most cases, a workstation build that provides every GPU with 16 lanes is far less cost effective.