Hell yes! Intel chips are about to get exciting again. SGI put FPGA's on nodes connected to its NUMA interconnect with great results. Intel will likely put it on its network on chip with more bandwidth and integration while pushing latency down further. 90's era tools that automatically partitioned an app between a CPU and FPGA can be revived now once Intel knocks out those obstacles that held them back.
Combine that with OSS developments by Clifford Wolf and Synflow in synthesis that can be connected to OSS FPGA tools to show even more potential here. Exciting time in HW field.
Even more exciting is the OmniPath[1] stuff that came out as a result of the Infiniband acquisition. RDMA + Xeon Phi + the insane number of PCI-e lanes[2] available for those new M.2 SSDs which post just absolutely insane numbers[3] all of which are supported by ICC[4] and you've got a really budget-friendly HPC setup. I'm really hoping for IBM's OpenPOWER to gain traction because Intel is poised to capture the mid-market in a dramatic fashion.
Not to nitpick, but M.2 is just a form factor. The big gains come from being NVMe PCIe, not the form factor. You get the same gains with NVMe PCIe in 2.5" drive form factor.
M.2[1] defines both the form factor and (importantly) the interface. While it is true that NVMe PCIe is the interface that makes the difference here, the standardization of both the interface and the form factor seems pretty important here.
Oh crap i didnt know that. Means they have a near-NUMA interconnect, FPGA tech, and recent RAS features. They could be a supercomputing and cloud force of nature if they play their cards right. They'll have advantage of best nodes, tools, and probably mask discounts. Getting more exciting...
I'm excited about this too, but the article suggests this is an industry wide thing; IBM is doing this with Xilinx, Qualcomm is experimenting with ARM stuff (not sure how this is different than the Zynq), AMD also with Xilinx. If there is something that works here I'm sure Intel wont be the only game in town.
Can you give any examples of how FPGAs helped SGI? I'm aware that a certain Voldemort-like .gov liked them at one time, but I never saw any uptake in the real world.
Intel is a volume player; this makes FPGAs a bit of a head-scratcher, since in the Grand Scheme, products might get prototyped as FPGAs, but they jump to high-volume, higher-performance ASICs ASAP.
Using FPGAs allows new features and optimizations to be tested and productionized (almost) as quickly as regular software. Upgrading to a newer design is essentially free and instantaneous, compared to expensive and time-consuming process of producing new chips.
I forgot to answer the other part of your question because I was skimming. The FPGA's didn't help SGI I don't think. They were just too hard to use and the nodes were expensive. However, SGI did the right thing at the wrong time period: connecting FPGA's to memory shared with CPU's over an ultra-high-speed, low-latency, cache-coherent interconnect. This allowed FPGA co-processing to do far more than it ever could on PCI cards where the PCI overhead was significant for many applications. Just like how NUMA performance trumped clusters over Ethernet or even Infiniband in many cases.
So, I was citing them as the opening chapter to the book Intel's about to be writing on how to do FPGA coprocessors. Hopefully. Meanwhile, FPGA's plugging into HyperTransport or an inexpensive NUMA would still be A Good Thing to have. :)
AMD was in the lead with the semi-custom processors that combined their CPU's plus custom logic for customers. That apparently was a huge business. Intel started doing it, too, IIRC. There's an overlap between those buying high-end CPU's and semi-custom stuff that might appreciate reconfigurable logic on-chip. It's a nice alternative given you get huge speed boost without capital investment of ASIC's and can keep tweaking it to optimize or fix mistakes.
As someone with basic knowledge of FPGA structure and how HDLs work, does anyone have a link on the limitations of FPGA-implemented processing vs traditional CPU/GPU architecture?
I get the sense there's a hole in my knowledge as to exactly what kinds of limits the structure of FPGAs places on the end result. And more importantly, why.
Since no experts replied, I'll try to make an attempt. The FPGA is a series of configurable logic blocks and routing hardware that simulate other logic. All digital hardware reduces down to primitive logic. So, FPGA's can do about any digital design. The real limitations are in performance, energy use, and cost.
The flexibility of an FPGA, from routing & configurable logic, takes up a lot of space that otherwise wouldn't be in the system. The blocks themselves are larger and slower than the primitives they simulate. The extra delays from this mean the FPGA can't be as fast as dedicated hardware. If FPGA simulates a CPU or GPU, the real CPU or GPU will always be faster due to optimized logic.
The other issue is power. The FPGA has all kinds of circuits that have to be ready to load up new configuration and simulate something else. Due to dynamic nature, the active parts also use more energy with all the extra circuitry. The result is that FPGA's always use more power than the custom chip.
The last one, cost, comes from the business model. No chips outside of recent GPU's have challenged FPGA's without going bankrupt or being a tiny niche player. Additionally, the EDA tools to make use of them are ridiculously hard to build with Big Two (Altera and Xilinx) investing a ton to get as good as they are. They also give them out cheap to free. So, anyone implementing a FPGA sold at cost will be unlikely to compete with Big Two on tools that make the most of the FPGA. That means anyone using FPGA's will pay high unit prices to line Big Two's pockets for quite some time.
Far as the structure, you have to map the hardware to the structure. Hardware is often done in pieces that connect to each other in a grid. So, the mapping isn't terrible. It's just hard to do efficiently.
Combine that with OSS developments by Clifford Wolf and Synflow in synthesis that can be connected to OSS FPGA tools to show even more potential here. Exciting time in HW field.