GPUs are still about 5x ahead compared to CPUs. CPUs lack texture lookup hardware, that's the biggest catch right now. AVX2 gives gather support (one instruction, up to 8x independent loads to a single vector register), but currently on Haswell there's no performance benefit compared to doing the memory loads individually.
GPU memory (GDDR5) has also many times more bandwidth at expense of 10x higher latency compared to CPU.
Currently you put the workload that requires low latency on a CPU and embarrassingly parallel "non-branchy" workload on a GPU.
Both CPUs and GPUs gain about 20% performance per year per core. I'm no expert on CPU and GPU memory architectures, but both seem to be headed towards integrated 3D-stacked eDRAM memory. This can provide up to several TBps reasonably low latency bandwidth.
GPU memory (GDDR5) has also many times more bandwidth at expense of 10x higher latency compared to CPU.
Currently you put the workload that requires low latency on a CPU and embarrassingly parallel "non-branchy" workload on a GPU.
Both CPUs and GPUs gain about 20% performance per year per core. I'm no expert on CPU and GPU memory architectures, but both seem to be headed towards integrated 3D-stacked eDRAM memory. This can provide up to several TBps reasonably low latency bandwidth.