> Switching FPGAs are incredibly power hungry since they run on much larger processes.
I don't think the comment about process is really true, from what I can tell, Xilinx is only a few months behind the biggest SoC makers in terms of its process adoption, and is shipping 14nm parts currently. Not sure about Altera, but they are on Intel's process, which is bit ahead of the competitors anyway.
In terms of switching power, you definitely pay a penalty to have the reconfigurability in hardware, but on the other hand you don't have all the unused logic that you would on a GPU. I'd guess the comparative efficiency depends on the specific problem and specific implementation, but I don't have any numbers to back that up.
It's a couple things, process is a large part. You're also dealing with 4-LUT instead of transistors so you pay both in switching power and leakage since you can't get the same logic-to-transisitor density that's available on ASICs.
Also there's a ton of SRAM for the 4-LUT configuration so you're paying leakage costs there as well.
NVidia managed to get it right about year and half ago. Before that their gates leaked power all over the place.
The LUTs on Stratix are 6-to-2, with specialized adders, they aren't at all that 4-LUTs you are describing here.
All in all, there are places where FPGAs can beat ASICs. One example is complex algorithms like, say, ticker correlations. These are done using dedicated memory (thus aren't all that CPU friendly - caches aren't enough) and logic and change often enough to make use of ASIC moot.
Another example is parsing network traffic (deep packet inspection). The algorithms in this field utilize memory in interesting ways (compute lot of different statistics for a packet and then compute KL divergence between reference model and your result to see the actual packet type - histograms created in random manner and then scanned linearly, all in parallel). GPUs and/or CPUs just do not have that functionality.
The Arria 10 (previous high-end Altera series) was at 20nm. The new Stratix 10 is at 14nm. UltraScale+ is in the 14-20nm range, I think, and Xilinx got there first.
(I don't know if you can publicly get Stratix 10 devkits yet, but you can get an Arria at least.)
The unused logic part isn't exactly true. The way FPGA's are built doesn't allow for unused sections to be completely shut off. Instead of dark silicon it's more like grey silicon. The unused parts of the chip still use substantial power, unlike ASIC where these unused portions simply wouldn't exist
I don't think the comment about process is really true, from what I can tell, Xilinx is only a few months behind the biggest SoC makers in terms of its process adoption, and is shipping 14nm parts currently. Not sure about Altera, but they are on Intel's process, which is bit ahead of the competitors anyway.
In terms of switching power, you definitely pay a penalty to have the reconfigurability in hardware, but on the other hand you don't have all the unused logic that you would on a GPU. I'd guess the comparative efficiency depends on the specific problem and specific implementation, but I don't have any numbers to back that up.