As someone with basic knowledge of FPGA structure and how HDLs work, does anyone have a link on the limitations of FPGA-implemented processing vs traditional CPU/GPU architecture?
I get the sense there's a hole in my knowledge as to exactly what kinds of limits the structure of FPGAs places on the end result. And more importantly, why.
Since no experts replied, I'll try to make an attempt. The FPGA is a series of configurable logic blocks and routing hardware that simulate other logic. All digital hardware reduces down to primitive logic. So, FPGA's can do about any digital design. The real limitations are in performance, energy use, and cost.
The flexibility of an FPGA, from routing & configurable logic, takes up a lot of space that otherwise wouldn't be in the system. The blocks themselves are larger and slower than the primitives they simulate. The extra delays from this mean the FPGA can't be as fast as dedicated hardware. If FPGA simulates a CPU or GPU, the real CPU or GPU will always be faster due to optimized logic.
The other issue is power. The FPGA has all kinds of circuits that have to be ready to load up new configuration and simulate something else. Due to dynamic nature, the active parts also use more energy with all the extra circuitry. The result is that FPGA's always use more power than the custom chip.
The last one, cost, comes from the business model. No chips outside of recent GPU's have challenged FPGA's without going bankrupt or being a tiny niche player. Additionally, the EDA tools to make use of them are ridiculously hard to build with Big Two (Altera and Xilinx) investing a ton to get as good as they are. They also give them out cheap to free. So, anyone implementing a FPGA sold at cost will be unlikely to compete with Big Two on tools that make the most of the FPGA. That means anyone using FPGA's will pay high unit prices to line Big Two's pockets for quite some time.
Far as the structure, you have to map the hardware to the structure. Hardware is often done in pieces that connect to each other in a grid. So, the mapping isn't terrible. It's just hard to do efficiently.
I get the sense there's a hole in my knowledge as to exactly what kinds of limits the structure of FPGAs places on the end result. And more importantly, why.