If you think GPUs are too power hungry then you're in for a shock with FPGAs. Sw...

thesz · on March 31, 2017

The processes used for FPGA are VERY competitive with GPUs. Stratix 10 is at 14nm, for example. Stratix V was built using 28 nm process and that was at least 5 years ago - on par or exceed NVidia.

You can fuse more operations into DSP using FPGA and/or you can perform less operations per FLOP. One example is to avoid rounding and packing/unpacking when creating deep pipelines for floating point processes.

vvanders · on April 1, 2017

Cheapest Stratix V I can find on digikey starts at ~$1,800 and quickly goes up to $10k for the bare chip in quantities of 24.

For open source hardware I doubt you'll see people shell out the cost of a used car to be able to match modern ~$200 GPUs.

I don't even what to know what Stratix 10 starts at.

thesz · on April 1, 2017

The complete devkit solution is about $7K - https://www.altera.com/products/boards_and_kits/dev-kits/alt...

The Xeon Phi board https://www.cnet.com/products/intel-xeon-phi-coprocessor-712... is $4.2K..$5K

The Stratix V board will not consume more than 60W when used as PCI Express card. The requirements for Xeon Phi is at least 250W.

With a difference of ~200W, there will be difference in ~4.5kW/h per day or ~1600kW/h or $160 in hard cash per year (US average). Very probably more - getting rid of heat produced, etc.

throwayedidqo · on April 1, 2017

Wow. These things just scream military use. My guess is phased array radar.

leoedin · on April 1, 2017

I worked on a space based imaging radar which used an FPGA. It cost mid 6 figures per chip...

This was admittedly a space certified radiation hardened chip. Still alarming when you had to pick it up and carry it somewhere

PhaseLockk · on March 31, 2017

> Switching FPGAs are incredibly power hungry since they run on much larger processes.

I don't think the comment about process is really true, from what I can tell, Xilinx is only a few months behind the biggest SoC makers in terms of its process adoption, and is shipping 14nm parts currently. Not sure about Altera, but they are on Intel's process, which is bit ahead of the competitors anyway.

In terms of switching power, you definitely pay a penalty to have the reconfigurability in hardware, but on the other hand you don't have all the unused logic that you would on a GPU. I'd guess the comparative efficiency depends on the specific problem and specific implementation, but I don't have any numbers to back that up.

vvanders · on March 31, 2017

It's a couple things, process is a large part. You're also dealing with 4-LUT instead of transistors so you pay both in switching power and leakage since you can't get the same logic-to-transisitor density that's available on ASICs.

Also there's a ton of SRAM for the 4-LUT configuration so you're paying leakage costs there as well.

thesz · on March 31, 2017

Tell me more about leakage.

NVidia managed to get it right about year and half ago. Before that their gates leaked power all over the place.

The LUTs on Stratix are 6-to-2, with specialized adders, they aren't at all that 4-LUTs you are describing here.

All in all, there are places where FPGAs can beat ASICs. One example is complex algorithms like, say, ticker correlations. These are done using dedicated memory (thus aren't all that CPU friendly - caches aren't enough) and logic and change often enough to make use of ASIC moot.

Another example is parsing network traffic (deep packet inspection). The algorithms in this field utilize memory in interesting ways (compute lot of different statistics for a packet and then compute KL divergence between reference model and your result to see the actual packet type - histograms created in random manner and then scanned linearly, all in parallel). GPUs and/or CPUs just do not have that functionality.

aseipp · on March 31, 2017

The Arria 10 (previous high-end Altera series) was at 20nm. The new Stratix 10 is at 14nm. UltraScale+ is in the 14-20nm range, I think, and Xilinx got there first.

(I don't know if you can publicly get Stratix 10 devkits yet, but you can get an Arria at least.)

nickpsecurity · on April 1, 2017

Achronix had production Speedsters at 16nm in 2016. They should be on that list given they're 1.5GHz or sonething like that.

throwayedidqo · on March 31, 2017

The unused logic part isn't exactly true. The way FPGA's are built doesn't allow for unused sections to be completely shut off. Instead of dark silicon it's more like grey silicon. The unused parts of the chip still use substantial power, unlike ASIC where these unused portions simply wouldn't exist

lvoudour · on March 31, 2017

Not my personal/professional experience. Even with heavy DSP usage with nodes >14nm you can still create designs with lower than 30 W power consumption, since you can control the frequency and use low power design techniques. There are vendors like Microsemi/Lattice that specialize in low power FPGAs where you can do even better

vvanders · on March 31, 2017

So there might be specific products that hit certain market segments(FWIW I really like Lattice's offerings). It's just that Watt to Watt GPUs should be more efficient since they can use the latest process and don't have to carry around 4-LUT + SRAM.

On the DSP side, you're using a ASIC DSP(can't change the width for instance) anyway on most modern FPGAs so you're comparing ASIC to ASIC at that point.

lvoudour · on March 31, 2017

Designing an ASIC with low power in mind will always beat an FPGA, no question there. But the design cost is prohibitive for many applications (not to mention the lack of flexibility to re-iterate/patch your design with little to no cost). But compared to GPUs I'm fairly certain you can do much better power-wise going with an FPGA. You have finer control over your frequency, over which sections to power down, how often to switch, etc.

Of course you can get better price/GFLOP with GPUs + quicker time to market

thesz · on March 31, 2017

And yet they (GPUs) aren't (better for FLOP/watt than FPGA) everywhere.

GPUs have very particular cache and computation hierarchy which is not necessarily a best/good fit for all problems that are being thrown at them.

buildbot · on March 31, 2017

FPGAs are typically used as the some of the first non test chips for a new process due to their fairly regular structure.

nsteel · on April 1, 2017

And it's a valid use-case for putting a lot of SRAM on a chip.

throwayedidqo · on April 1, 2017

That's really interesting