I've heard that before about floating point co-processors, they're definitely un...

JulianMorrison · on Jan 1, 2010

Point, but I'm not sure they're comparable. Both CPU and GPU are something you can never have enough of, because their task list expands to meet supply. Contrast: once you have enough FPU, you're done.

jacquesm · on Jan 1, 2010

It's a game of bottle-necks. Solve one, you get another one for free.

Once you have 'enough' FPU you have not enough memory bandwidth, so you go wider / faster on the memory bus (this is already happening, we are now well over 1GHz on the memory bus), or you place the memory closer to the CPU (also happening, increased cache size).

Then as soon as that is done you now no longer have 'enough FPU', so you go parallel.

GPUs now have almost 250 cores and yet there are plenty of people that use more than one in a single machine (I've seen up to 4 of those, with two dies each for almost 2000 cores). Clearly some people don't have 'enough FPU' yet, and plenty of games would want more effects to add to their engines (which seems to be the biggest driver of this kind of development outside of hard core number crunching).

Increased resolution displays are another driver for more FPU power because once you start shading every pixel becomes the end point of a long pipeline of mathematical operations.

I don't foresee anybody complaining of 'too much FPU' in the next decade or longer, in fact I suspect that once this kind of FPU capacity becomes mainstream that we'll see whole new breed of applications to take advantage of it.

andrewcooke · on Jan 1, 2010

i'm not sure they're as different as you are making out. why couldn't you have a chip that's somewhere between a cpu, a gpu and an asic - it rewires itself to make a trade-off between cache+prediction v many alus+parallelism? maybe the asic part is a bit extreme, but fermi (the next gen nvidia) will have on-chip memory that is switchable between explicit local shared memory (gpu) and implicit local cache (cpu). why couldn't the same kind of flexible approach be made to instructions - longer+fewer v more+shorter pipelines?