Engineers boost AMD CPU by 20% with software alone; no overclocking

wwalker3 · on Feb 7, 2012

It sounds like the NCSU guys are using the CPU as a prefetcher to speed up GPU kernel execution, not using the GPU to speed up normal CPU programs as the ExtremeTech article implies.

The CPU parses the GPU kernel and creates a prefetcher program that contains the load instructions of the GPU kernel. This prefetcher runs on the CPU, but slightly ahead of kernel execution on the GPU. This warms up the caches, so that when the GPU executes a load instruction, the data is already there.

ahcox · on Feb 8, 2012

Yes, you are right. In fact we don't have to infer this. The researchers state it directly in their abstract, "...a novel approach to utilize the CPU resource to facilitate the execution of GPGPU programs..."

Here it is: http://news.ncsu.edu/releases/wmszhougpucpu/

profquail · on Feb 8, 2012

It sounds like the NCSU guys are using the CPU as a prefetcher to speed up GPU kernel execution, not using the GPU to speed up normal CPU programs as the ExtremeTech article implies.

The article says the same thing you are -- that the CPU is used as a prefetcher for the GPU; read the 3rd paragraph:

To achieve the 20% boost, the researchers reduce the CPU to a fetch/decode unit, and the GPU becomes the primary computation unit. This works out well because CPUs are generally very strong at fetching data from memory, and GPUs are essentially just monstrous floating point units. In practice, this means the CPU is focused on working out what data the GPU needs (pre-fetching), the GPU’s pipes stay full, and a 20% performance boost arises.

teamonkey · on Feb 7, 2012

This is only tangentially related, but with a title like that I was expecting a brainless regurgitation of a press release, or some kind of extrapolation from a paper that wasn't claiming that meaning at all.

Instead, I see a news article with a clear description, caveats and constraints clearly listed, and a portion of how this relates to the parent company. It's a shame that I find this surprising.

bryanlarsen · on Feb 7, 2012

The fact that it's only a 20% increase makes it sound promising. Normally press releases will boast about "100x" increases in speed when they switch to using the GPU. And you can get that sort of increase for highly parallel tasks with low memory pressure. BitCoin mining, for example. But the low 20% speedup implies that they're doing this for general purpose computing.

faragon · on Feb 7, 2012

That's hilarious. Using a whole CPU for prefetching data because of poor shared bus performance for both the CPUs and GPUs (?!). Instead of such crazy "software solution", I would rather prefer to use a portion of its L2 or L3 cache size (e.g. 1MB for a 3MB L2/L3 cache) for the GPU itself, and reduce the bus saturation with DMA transfers (e.g. just like the SPE units of the Cell CPU work).

pessimist · on Feb 7, 2012

So by using a custom compiler someone speeded up an unspecified benchmark by 20%. Is this news?

elemeno · on Feb 7, 2012

If that was all it was then no.

However, what they did was demonstrate a novel way of making use of two different processing cores that exist on the chip (namely using both the CPU and an integrated GPU) to improve the performance of their benchmark - which certainly is both interesting and news.

Of course, a proof of concept is a long long way from it being of practical benefit!

Someone · on Feb 7, 2012

A very, very long way, I would guess. A 20% performance gain is nice, but having to power a GPU to get it is not. I would expect that adding a second CPU instead of that GPU almost always will give you more than that 20% performsnce and less heat, for less money.

EvanKelly · on Feb 8, 2012

It depends on the application. If the application, as the article puts it, "pushes polygons around", then I imagine the APU concept may have the advantage.

Though, as previously noted, this APU concept is highly dependent on tailored software (compilers, etc.) and AMD has been banking their strategy on the fact that these critical pieces will take advantage of the APU.

I think the NCSU research (co-sponsored by AMD) is a move in the right direction for determining whether these APUs are an effective solution when compared to the multi-CPU architectures.

afhof · on Feb 7, 2012

GPUs are pretty tailored and aren't really good for general purpose computing. Branching and cache coherency are much easier in the CPU compared to the GPU. I doubt that any of the advertised gains would be realized by normal users.

cbsmith · on Feb 8, 2012

It was the GPU that ran 20% faster by leveraging the CPU, not the other way around.

nivertech · on Feb 7, 2012

I hope this has something to do with HSAIL virtual ISA. For example general purpose code in C compiled to HSAIL and then CPU makes intelligent decisions which parts of code to JIT-compile to CPU and which to GPU ...

KeyBoardG · on Feb 7, 2012

Hopefully we can get this into drivers sooner than later. AMD has already been working with Microsoft to get a large performance gain out of BullDozer chips in Windows 8 simply by the way threads are prioritized.

overshard · on Feb 7, 2012

The title is deceiving as per usual. It's mostly using the GPU and using CPU for prefetching. Nothing too new here, we know the GPU is faster.

sliverstorm · on Feb 7, 2012

we know the GPU is faster

More specifically, the GPU is more parallel.

gcb · on Feb 7, 2012

summary: they send all the instructions to the CPU to simply encode them for the GPU, and let the GPU do the heavy lifting.

and end up saying that AMD is dying as the news love to do.