I work in the FPGA industry and I've supervised university projects, seen lots o...

glangdale · on June 23, 2018

Thanks for the detailed reply. My post may have seemed like partly-informed sour grapes but your information fits in well with what I've seen.

In a number of the applications I've seen the other killers are the fact that not only do you have the transfer costs you mentioned to the device, you also:

1. Have to get information back from the device - and in regular expression matching this might be 1 match in 1000 or 1 match in 5 if you're unlucky, and

2. Have to have a lot of parallelism to hit peak performance, yielding great throughput but so-so latency. At Sensory Networks during our hardware stage, we had a "2 Gbps regex accelerator" (hah) that didn't even hit that modest number on a single stream - it actually required 14 streams or so running at 142Mbps.

Many of the same sins are repeated for GPGPU.

The other thing that I notice is that the "noddy s/w solution" sometimes is the only thing out there. I looked at some accelerator work on Random Forest inference (not training) and - wow - all the RF implementations are naive. There are a lot of s/w tasks out there that no-one has bothered to optimize with any effort at all.

However, when your adviser says "make a GPGPU/FPGA thesis" I think a smart PhD just goes and does that, rather than sinking 6 months into building a really great s/w comparison. :-)