How FPGAs work, and why you'll buy one

bhb916 · on June 17, 2013

This is a solid article. I'm continually surprised by how few software engineers in industry spend the time to pick up HDL and FPGA programming in general. In my mind, it is an easy way to expand your breadth of knowledge and make you a touch more valuable to future employers. They say that when all you have is a hammer, everything looks like a nail and I'm certainly inflicted with that same disease as I see the utility of FPGAs everywhere I look. Prices have plummeted while densities have skyrocketed. A simple $25 part gets you quite a bit of fabric and some $90 eval hardware will give you a sweet little platform. [1]

With that said, since I began working with them there have been two "Holy Grails" of FPGA design: (1) Partial Reconfiguration and (2) High Level Synthesis.

The first, Partial Reconfiguration, has been more-or-less solved although the tools have a long way to go. One current design I'm working on loads it's PCIe endpoint and DDR3 controller first, establishes communication with the application running on the host PC, then based on user input loads the rest of the FPGA.

The second, High Level Synthesis, isn't here yet. The goal is to turn a company's vast army of software engineers into FPGA programmers overnight. A worthy cause. Every foray into this field has failed (although the jury is still out on Xilinx's purchase of autoESL) Honestly, I'm not sure it will ever get there. The point of optimized, custom hardware is to make use of it. Abstracting it all away seems counterproductive, not to mention very hard.

[1] http://www.xilinx.com/products/boards-and-kits/AES-S6MB-LX9....

lambda · on June 17, 2013

The last time I looked into FPGA programming (probably back when I was in college), the cheap eval hardware cost at least a few hundred dollars a board, if not a few thousand. For something that would just be a toy to play around with, it was too much to justify spending money on.

On top of that, the tooling is all proprietary, clunky, with obnoxious licensing. If you notice, that eval board you pointed to comes with "ISE® WebPACK® software with device locked SDK and ChipScope™ Pro licenses". I'm just not interested at all in any "device-locked SDKs".

And finally, there's a bootstrapping problem. FPGAs aren't really a mass-market platform, because no one has them (unless they're developing custom hardware).

If there are now $90 eval boards, that solves part of the problem. Another big step would be building a reasonable toolchain around it that does not consist of big clunky proprietary tools with onerous licensing.

And finally, someone will need to start shipping FPGAs with a standardized interface on commodity hardware. If I can depend on their being an FPGA in a phone, in a workstation, or in COTS server hardware, there are lot more possible applications. As it is, you already have to be building your own hardware before an FPGA is something you would want to target at all. The vast majority of programmers write code to run on COTS hardware (servers, desktops, mobile), not custom hardware.

tlrobinson · on June 18, 2013

And finally, someone will need to start shipping FPGAs with a standardized interface on commodity hardware.

If Apple included an FPGA in their machines with some cool sounding marketing name like Particle Accelerator™ it would probably catch on.

The last time I looked into FPGA programming (probably back when I was in college), the cheap eval hardware cost at least a few hundred dollars a board, if not a few thousand.

Not sure when you were in college, but we used FPGAs in my upper level computer engineering courses, circa 2006, and the Xilinx dev boards we used were about $120.

woodchuck64 · on June 17, 2013

> Another big step would be building a reasonable toolchain around it that does not consist of big clunky proprietary tools with onerous licensing.

While I think a toolchain other than Xilinx or Altera is out of the question due to technical complexity, Quartus Web Edition is free and supports the low-cost Cyclone devices including the ones with ARM cores (SoC): (pdf) http://www.altera.com/literature/po/ss_quartussevswe.pdf. I imagine Xilinx has something similar.

> And finally, there's a bootstrapping problem. FPGAs aren't really a mass-market platform, because no one has them (unless they're developing custom hardware).

Yes, that's the chicken/egg problem that both Xilinx and Altera are addressing by putting ARM cores on their devices that run just fine without any FPGA configuration at all. It's an appeal to software developers while pitching the FPGA hardware as a super-special add-on feature really cool to have for ... something.

Altera has just started shipping the first SoC devices I think, about 6 months or so behind Xilinx. Now we'll see if the strategy works...

VLM · on June 18, 2013

The problem with eval boards is the tendency to explode price upward by sticking peripherals on the board. Sure, a 10/100 ethernet intf sounds cool and it'll only add $5... times 50 other devices and next thing you know you have one of those cool, but expensive Digilent boards that has more I/O options than you could ever use, but costs $200.

If you just want a nearly bare FPGA google for "micronova fpga mercury" and for $60 you get a DIP-64 circuit board with quite a few I/O and that's about it. No on board graphic LCD. No on board VGA output jack and D/A interface. Think, the basic stamp from 20 years ago, now a FPGA dev tool. I have one sitting on my desk waiting to fool around with it (I have a lot on my desk...)

Also all the software is free now. Maybe limited such that you need a license to synthesize for a $2000 chip, which doesn't matter. Pretty much if your average garage hacker can afford the dev board, the limits on the software don't matter. Closed source, but available for linux and windows and free.

jbester · on June 18, 2013

Papilio one and papilio pro are both under $100

Vendor - https://www.sparkfun.com/products/11158

OEM - http://papilio.cc/

olouv · on June 17, 2013

You can have a small FPGA test board here http://be.eurocircuits.com/shop/offtheshelf/product.aspx?&an... for 60€. The tutorials (non-free) can be found there http://www.elektor.com/magazines/2012/december/taming-the-be...

lambda · on June 17, 2013

That only addresses the one point (expense of the test boards), which as I pointed out, is the one major problem that has been solved (I consider a $90 dev board reasonably affordable for tinkering, so a $70 dev board is better, but not by much).

It doesn't seem to solve the other two problems, though I suspect that solving the second (lack of open tools) would go a long way towards solving the third as well (ubiquity of hardware available to target).

rwmj · on June 18, 2013

The proprietary software is always going to be there (unfortunately).

The precise hardware layout of an FPFA is important for performance and a trade secret. Proprietary software algorithms can squeeze more out of the fixed hardware.

dllthomas · on June 18, 2013

If proprietary software is why I'm not buying an FPGA (and if there are a lot of people like me) there will be much more to be made opening things up than keeping things closed. Not saying that this is the case, just that it could be, so "always" might be an overstatement.

rwmj · on June 18, 2013

I hope you're right, I work at Red Hat after all.

However I still don't think it's likely. The trouble is this is all tied to the hardware. FPGAs are integrated circuits with a complexity and development cost close to that of CPUs. (The fact that you can only use about 4% of the density of the FPGA is beside the point here). Because of this huge barrier to entry -- building your own fab -- we don't have fully open source x86-compatible high end processors, and we don't have open FPGAs either.

As long as the hardware is closed, the software for doing FPGA design is also going to be closed, for the reasons I outlined in my comment above.

dllthomas · on June 18, 2013

Yeah, it's more hope than expectation; it just seems unlikely not absurd though.

Brashman · on June 18, 2013

If you're interested in a non-proprietary toolflow, you can try taking a look at VPR (http://www.eecg.toronto.edu/vpr/). It's an academic place and route tool out of the University of Toronto. You may still need to use some parts of the proprietary toolflow. I've never used it to actually run on an FPGA, but it's a popular tool for academic research in FPGA design and EDA algorithms.

Florin_Andrei · on June 17, 2013

http://www.adafruit.com/products/451

robomartin · on June 18, 2013

> I'm continually surprised by how few software engineers in industry spend the time to pick up HDL and FPGA programming in general.

While I appreciate your sentiment, it isn't programming, it's hardware design. To do anything beyond the somewhat trivial you need to learn a pile of EE stuff. Thinking of FPGA's as software might be nice, but that's not reality. Some examples of this are signal integrity, metastability, transmission lines, propagation delays, timing closure, SSO, etc. And that's the tip of the iceberg. You can't just shove in a 600MHz clock, write something that looks like software and expect the darn thing to work. I once spent nearly three months chasing a 3 nanosecond timing problem on a complex design. The "code" was perfect. It was a layout, routing and timing problem. FPGA's are not software.

Of course, this is assuming an EE has done the hard work of designing a solid board with all the right I/O, interfaces as well as the all the initialization and setup files. Again, as an example, a DDR3 bank is not a trivial piece of hardware to design, layout, interface and configure.

None of this is to say that it is impossible for a non-EE to learn and become quite capable at developing with FPGA's. however, this isn't just like learning a new programming language. I'll venture a guess that most programmers would have trouble committing to going down the deep and twisted rabbit hole they'd have to navigate. Again, not impossible. Seen it done. But it isn't software and it requires serious dedication.

yosefk · on June 18, 2013

I agree about the pile of EE stuff, but it's not unlike having to learn a pile of math to work on, say, graphics; it's two areas into which programmers can naturally branch out, but it's not trivial.

And yes, you'd need EEs around; you'd need artists around to ship nice graphics, ultimately, or you'd need to be one yourself. Another angle - to be really good at C, you basically have to be fluent in assembly, at least in reading it, so you could say that C isn't just another language, or "C isn't a high-level language, it's a portable assembler", analogously to "FPGA isn't software, it's hardware design." In reality there's a continuum between hardware and software design, as there is a continuum between programming and math, programming and art, high level languages and low level languages, etc.

robomartin · on June 18, 2013

Well, we might have to disagree. Implying that it is just a matter of math is almost like saying that any programmer could just walk in the shoes of an EE by just doing some math. Which, of course, isn't even remotely true.

I have a high-school age kid. I taught him Java, C and chunks of PHP. He's already written a half dozen simple games. All of that in about a year of sporadic time. Now I am gettimg him started in electronics. The road ahead is far more difficult than learning to program in any language. Buying Arduinos isn't knowing electronics. There's a vast difference between using an Arduino and designing one. And, BTW, that's relatively simple embedded stuff.

I am not saying that a talented software guy can't learn to design with FPGA's. Not at all. Talented driven people can do anything with enough motivation. All I am saying is that EE doesn't magically turn into software development just because one is using an FPGA. Examples of the complexity and range of disciplines required abound, things like thermal design, signal and power integrity are disciplines in and of themselves that are often the domain of specialized EE's in design teams. The first time I laid out a design with clocks ranging from tens of MHz to GHz, as an EE with excellent command of the science behind the task at hand, it took me months to get it right. That had to work perfectly before my FPGA design and embedded code had even the slightest chance of making the board go.

lkozma · on June 18, 2013

I don't think he meant that EE is just a matter of math, but rather that branching out into EE can be thought of as analogous to branching out into math.

epsylon · on June 18, 2013

> I agree about the pile of EE stuff, but it's not unlike having to learn a pile of math to work on, say, graphics

If you're thinking about CGI, honestly I'd say that the maths really isn't that hard. Most of it is basic linear algebra and a bit of signal processing.

I might be biased in saying that (because I have a solid maths background as well as a computer graphics one) ; but I've done a lot of electronics as well (including working on FPGAs) and I still feel that EE is a whole different world.

(Off-topic: I realized that you were the guy behind the C++ FQA. Thank you a million times for that! )

schabernakk · on June 17, 2013

> I'm continually surprised by how few software engineers in industry spend the time to pick up HDL and FPGA programming in general.

unsufficient tooling? I did some programming with FPGAs once and it seemed the best option I had for programming was proprietary software by altera (quartus). I never got debugging to work or perhaps I did and I didnt understand the stuff it was showing me (I am no hardware guy)

My impression was that an eclipse-like ide, perhaps with a built in HW simulator would make things A LOT easier, especially for beginners like me. Of course this could be completely unrealistic and impractical for hardware design in which case I will show myself out.

yosefk · on June 17, 2013

A free simulator (like iverilog) plus a free waveform viewer (like gtkwave) is a nice way to start fiddling with these things. Verilog is way - WAY - easier to deal with than say C, actually, because you have much more visibility (waves instead of a variable view) and much better error detection (none of those bloody memory overruns, built-in Valgrind with the "X" values, etc.)

I'm a programmer by training, and I got into hardware architecture in part because of how fun Verilog was, actually - seeing all this stuff.

Compile and run Verilog online (which is how I tested the Verilog code in the article):

http://www.compileonline.com/compile_verilog_online.php

Running on FPGA without testing on a simulator first can indeed be tough for a newcomer, I'd guess.

jlgreco · on June 17, 2013

Eh, gtkwave is the sort of thing that I feel allergic to. I've used it extensively in the past but at no point was it something that I enjoyed having to use. Moreso than not liking to work with proprietary software, what I really don't like to do is work with non-browser/non-terminal based programs. Those sort of tools just really don't seem like tools that were designed with users like me in mind. If I have to work with programs like that then I of course will, but in my personal time that sort of program drives me away.

If gtkwave took some high-level inspiration from graphviz, that would be great.

Also, good god is GHDL horrific...

simias · on June 17, 2013

I've started to learn hardware design a couple of months ago and this is good advice.

However I'd like to mention that many constructs work fine in simulation but are completely backwards/impossible to synthesize in FPGA or ASIC. Writing verilog is easy, writing good verilog much more difficult.

I don't know if there are any good and comprehensive resources for learning hardware design coming from a software background, I have the chance to work with some ASIC guys who can help me with those things.

alok-g · on June 17, 2013

I seriously want someone to disrupt FPGA tooling.

In my understanding the tooling for FPGAs has been made purposefully complicated to achieve a lock-in. The issue is that adoption of new tooling practically requires compatibility with the existing tools from the two key market holders, and I do not think even they themselves could do that anymore.

ChuckMcM · on June 17, 2013

Yes and no. So on the one hand there are lots of open source efforts at both Hardware Description Languages (HDLs) such as JHDL, variations on Systems C, Verilog, and the async work that was going on at Utah State I believe. But the "place and route" bits are, by their nature, unique to the FPGA architecture. Further there is a lot of work on encrypting bit streams so that designs can't be "stolen" and you end up with a very proprietary 'blob' at the bottom of the stack. Xilinx did an "open" FPGA for a while (the 3000 series) but nobody used it at the time in volume.

That said, the complexity of the FPGA tools is also as much about the circuit capabilities (describing IO pads for example in terms of their power levels, latencies, and connectivity) as it is about the overall complexity of the problem.

The methodology for designing these things is surely straining. And of course its a very test driven practice, since no hardware engineer ever seems to just "try" something in an FPGA until they have a testbench that can simulate it. (the equivalent of unit tests in software).

Most (all?) vendors offer a bit of a free stuff, and I know the Xilinx place and route engine can take its input from any EDIF source so you can write your own 'design' tools if they output EDIF. I'm a bit scarred because there was an effort called the "CAD Framework Initiative" which was going to standardize APIs between all the layers of the stack but once vendors figured out that their high priced tools could be easily disrupted they backed out of that standard in a hurry. Too bad really.

marshray · on June 17, 2013

An industry ripe for disruption. My guess is that patents and military contracts are propping up the few entrenched vendors.

Eventually, the simple FPGA designs from 20 years ago will perform "good enough" when shrunk down to modern manufacturing processes. Only then will the new age of reconfigurable computing begin.

shubb · on June 18, 2013

As far as I've seen, the main users of FPGAs are developers eventually targeting ASICs.

A video codec or whatever might first be implemented in C++, then 'translated' to verilog and tested thoroughly on an FPGA. Having solved all the logical issues, and detected a lot of potential timing issues, the HDL design could be translated into an ASIC design, and heavily simulated. Then, confident that the design is good, the company could spend the bucks to make a mask for mass manufacture.

You do occasionally see people using FPGAs where they need the zero latency of a hardware design, but only need a couple of devices. Usually this is in RF research labs and the like.

I suspect most people with compute problems would be better off using a GPU.

marshray · on June 18, 2013

Perhaps. But I think we'll never know unless they can be fit within a mainstream software workflow at a reasonable price.

yosefk · on June 17, 2013

It's genuinely complicated; if Xilinx could disrupt Altera by making easier-to-use tools, it would. (In fact it tries with AutoESL.)

I hope to explain why FPGA tooling is intrinsically hard in my next write-up.

amirhirsch · on June 17, 2013

here's a few good reasons why it's hard to make easy-to-use vendor-neutral FPGA tools:

- all the devices/bitstream formats are proprietary with little or no documentation of the logic blocks or programmable interconnect structures. it is probably technically easier to build a new FPGA from scratch and design tools for that, than to reverse engineer existing chips [1]

- there is very little cross-compatibility between vendor products (a 4-lut here, a 6-lut there, some carry-chain logic here, a DSP block there)

- all the optimizations (synthesis, place-and-route) are NP-hard problems

- sequential imperative (C-like) thinking is not the correct way to make parallel systems

- the FPGA vendors compete on tools and offer their software for free to push hardware. hard for an independent vendor to compete.

[1] some reverse engineering efforts exist. see "From the bitstream to the netlist" http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.117... / http://code.google.com/p/debit/

mjn · on June 17, 2013

all the optimizations (synthesis, place-and-route) are NP-hard problems

I suspect this one in particular could be chipped away at pretty successfully if the tooling were less closed. When you get some open benchmarks and let both researchers and systems-builders hack away at it, you can get very good practical solutions to NP-hard problems, as the current crop of SAT-solvers, SMT engines, and TSP-routing tools attest.

amirhirsch · on June 17, 2013

I wrote an post 5 years ago about how an electronic-design-automation-as-a-service company could sell access to a huge supercomputer for solving these problems faster: http://fpgacomputing.blogspot.com/2008/08/megahard-corp-open...

most researchers make up their own architectures to demonstrate place-and-route algorithms, for example there is this challenge to improve p&r: http://www.eecg.toronto.edu/~vaughn/challenge/challenge.html

fieldforceapp · on June 18, 2013

Commercial "vendor neutral" tools do exist, Synplicity was a public company which sold such tools for several years before branching into proto board hardware sales and then being acquired by Synopsys [1]. Their tools for synthesis, partitioning and debug are still available [2] and strong sellers to both Altera and Xilinx FPGA developers.

[1] http://news.synopsys.com/index.php?s=43&item=570 [2] http://www.synopsys.com/Tools/Implementation/FPGAImplementat...

alok-g · on June 17, 2013

While I find your comment informative, points 3 and 4 do not really explain why vendor-neutral tools cannot exist. Point 2 (my best understanding) should not be a breaking issue if documentation and specifications are available (which is covered by point 1). It seems to me that point 5 is part of the thing they have done to prevent alternative tools to come to market. Lastly, point 1 is exactly what begs for disruption.

makomk · on June 17, 2013

The manufacturers also treat the exact implementation details of their FPGAs - how exactly all the routing fabric is structured, the fine details of some of the macros, etc - to be their secret sauce that gives them an edge over their competitors. So they're pretty hostile to the idea of documenting anything.

bhb916 · on June 17, 2013

It being "hard" shouldn't give them a pass on quality control. From the early Project Navigator to ISE to Vivado, Xilinx tools are buggy, poorly documented, and completely unintuitive. Yet, they're still better than Altera's.

alok-g · on June 17, 2013

I had never used Altera. It's sad to hear that it is even worse!

bhb916 · on June 17, 2013

I moved one of our smaller projects over to Altera because they were first to 28nm plus the local Altera team was willing to help out with porting any device-specific code.

I was very disappointed with the tools. It's worth a write-up all on it's own, but my staff struggled through the tooling which consumed most of their time.

alok-g · on June 17, 2013

Just asking, not questioning: Are you sure they would not have similarly struggled if they were pros in Altera and were moving to Xilinx? An issue with counter-intuitive and buggy tools is exactly that you get used to them, your mind slowly forgets how messy a design the tools have, and now everything else seems counter-intuitive.

bhb916 · on June 17, 2013

That's a good question. It's almost impossible to tell objectively. I would say that the Altera tools felt more like circa 2007 Xilinx Tools. They lacked quality control AND maturity. This is all very subjective, though.

djcapelis · on June 18, 2013

Ugh. I remember 2007 Xilinx tools and I wouldn't wish them on anyone. I was so surprised when I started doing FPGA work again and found even the old ISE versions I'm using (2010/2011-ish) are so much better!

I have access to newer software, but the designs I'm working with don't import correctly into them. :-P

mjn · on June 17, 2013

I'm waiting for Chuck Moore to release 2kb of Forth that he claims replaces the only parts you need...

fnordfnordfnord · on June 17, 2013

Please do.

I have some opinions here and I wish I was prepared to make good comments. I've seen some horrible things. I hope you give it as thorough a going-over as you did in this article.

alok-g · on June 17, 2013

@_yosefk: I am interested in this write-up. Will wait for it. Thanks

DigitalJack · on June 17, 2013

The problem is a lack of overlap between software engineers and hardware engineers, particularly in the open source arena.

I'm a digital designer, and do software as a hobby. I've thought about writing up a guide to get into fpga hacking. Tools are always a hindrance. Simulation has some free tooling, but as far as I know there is nothing for synthesis.

EDIT: There is this https://code.google.com/p/vtr-verilog-to-routing/. It targets hypothetical architectures.

There could possibly be a kickstarter idea here to pick an architecture they have targeted and get some designers to implement this. You could even host one of these theoretical architectures on top of another FPGA. It'd be slower than the host, but feasible.

yosefk · on June 17, 2013

I think partial reconfiguration is actually vital - you can't do stuff without it - while C-to-Verilog is not. Though I've seen people loving AutoESL; one claim - always repeated about higher-level languages - is that you can explore more possibilities and thus beat hand-coded Verilog because of the time it takes to hand-code yourself into a local optimum while missing the global optimum. I sort of ignore the language question, because Verilog feels nice enough to me, though perhaps it's important in the sense that most programmers want C.

lambda · on June 17, 2013

> most programmers want C

Hmm. I've always thought that most programmers want languages that offer more abstraction than C. C is probably about the opposite of what you want, since it's a fairly low level language targeting a fairly different model of computation than an FPGA. What you want is higher-level tools that are designed and built around this much more parallel model, not something that allows you to compile C to an FPGA.

In fact, I think that there would be a lot more interest in creating better languages and tools for FPGAs, if there were actually open documentation on the low levels of how to program FPGAs, so people could actually write their own tools targeting them. But most vendors seem to just provide their own Verilog compilers with onerous licensing restrictions, and there's little portability between hardware, so there's nothing really good to target.

skaevola · on June 18, 2013

AFAIK C is by far the most prevalent language for embedded systems, which are typically the applications FPGA's are also used for.

bhb916 · on June 17, 2013

I agree with your assessment. PR is important to designers and system engineers and architects. C-to-Gates is important to business types.

sliverstorm · on June 17, 2013

The best part of learning HDLs and other bare-metal experience, I've found, is it taught me to think in parallel and asynchronous frames.

fieldforceapp · on June 18, 2013

>High Level Synthesis isn't here yet

Plenty of commercial development on that front recently. Forte Design Systems [1] is selling tools targeting both FPGA and ASIC design using a C++ class library front end (SystemC). Synopsys has two flows in their Synphony product line, one which synthesizes from C called Synphony C Compiler [2] and another flow based on Matlab libraries which isn't really synthesis per se but supports mapping between Matlab Simulink to FGPAs [3].

Use of these (including AutoESL) seem focused on certain domains. Image processing, cryptography and machine vision seem popular areas just now.

[1] http://www.forteds.com/products/cynthesizer.asp [2] http://www.synopsys.com/Systems/BlockDesign/HLS/Pages/Synpho... [3] http://www.synopsys.com/Systems/BlockDesign/HLS/Pages/Synpho...

bhb916 · on June 18, 2013

I find it interesting that Synopsys would attempt to compete directly with Xilinx's System Generator product. The latter is to my knowledge completely endorsed and supported by The Mathworks. Something to investigate ...

fieldforceapp · on June 18, 2013

The comercial tools market is usually based around "flows," which is a fancy way to say they try to bundle their offerings. While Xilinx is primarily interested in an FPGA flow, Synopsys of course would be happy to help you retarget your design to the more expensive ASIC flows as well. In addition, Synopsys offers hardware prototyping & emulation. All of which means while there's competition with Xilinx (and Altera, and Mathworks too) each vendor is focused a different use case of FPGA's.

damian2000 · on June 18, 2013

I was lucky enough to be able to do a tiny bit of FPGA programming last year in Verilog. Its deceiving how similar to C it seems initially, then when you start getting into it how vastly different it is to normal programming. I remember thinking that everything is literally running in parallel.

dualogy · on June 17, 2013

> I see the utility of FPGAs everywhere I look

Examples?

jff · on June 17, 2013

I see the utility anywhere a simple, repetitive task is performed with interaction from the physical world.

The canonical student example is the soda machine. Sure, you could put a small Linux system in there and write Python code, but then you have millions of lines of Linux and Python code underlying your simple little soda vending code. Instead, you describe the problem as a state machine and implement it in the FPGA. This gives you an instant-on solution.

I often find myself looking at the behavior of basic systems like automatic doors, card readers, thermostats, and motion-triggered light switches, thinking about their behavior and how I might describe it as a state machine. Thinking about what combination of inputs and outputs, what sensors and actuators would be needed to perform the task. It's fun.

Florin_Andrei · on June 17, 2013

> The canonical student example is the soda machine. Sure, you could put a small Linux system in there and write Python code, but then you have millions of lines of Linux and Python code underlying your simple little soda vending code. Instead, you describe the problem as a state machine and implement it in the FPGA. This gives you an instant-on solution.

If it's indeed just a soda machine, you could implement it in a microcontroller and code plain C for the whole thing. The FPGA seems overkill.

Take Atmel for example. The Arduino boards are based on the Mega series. Fairly complex things (the ATMega microcontrollers), yet easy to program. I've played with the Tiny series - they're great if you don't need that many inputs / outputs.

http://www.atmel.com/products/microcontrollers/avr/default.a...

It's instant-on and very low power.

To me, an FPGA is better for real-time video scrambling and stuff like that.

HeyLaughingBoy · on June 18, 2013

Exactly, and this is pretty much the answer to the OP's question about why software engineers haven't embraced FPGAs.

Answer; there are other solutions that are simpler and much closer to their existing knowledge base. You want to build a soda machine? I can implement that logic in an 8 pin, fifty cent Atmel tinyAVR device that can be programmed in C or C++

Skills of software engineers are probably better applied to building better FPGA toolsets.

kragen · on June 18, 2013

You can probably also implement it in a PLC, which costs more than fifty cents but already includes the relays that you need to drive the solenoids that drop the soda cans and the darlingtons you need to drive the relays. Overall it might be cheaper and easier to program. The US$100 Phoenix 2701043 is the low end here: http://www.digikey.com/product-detail/en/2701043/277-2648-ND...

You might reasonably ask why there aren't 8-pin fifty-cent FPGAs. I don't really know, but some PLDs (not PLCs) come pretty close; http://www.digikey.com/product-detail/en/ATF16V8B-15JU/ATF16... was what I came up with on a Digi-Key product index browse: the Atmel ATF16V8B-15JU-ND, 84 cents in volume, a 20-pin Flash PLD with 8 macrocells, each with a flip-flop, 8 I/O pins, 10 input pins, and 10ns pin-to-pin latency. You can probably do a soda machine with 256 states. But an ATTiny has at least 1024 bits of RAM, which is a lot more than 8, and can do much more complex logic; it's just hundreds of times slower than the PLD.

A little higher upmarket, there are CPLDs like the 24-pin ATF750LVC-15SU-ND, which goes for US$3.72 in volume, with 20 flip-flops (bits of memory), 10ns pin-to-pin delay, EEPROM, 171 product terms feeding into 20 sum terms, and 12 input pins and 10 I/O pins.

For a while there was a Spartan FPGA that went for US$5, but perhaps due to a lack of articles like Yossi's here, it didn't do well and seems to be out of production.

Basically I think that when we're talking about state machines that are constrained to operate at mechanical speeds, it makes more sense to use the microcontroller approach — when your response times are measured in microseconds or milliseconds instead of nanoseconds, when you really only need one addition or multiplication per clock cycle instead of 8 or 100, you might as well do them one at a time and spend the extra real estate on more programmability rather than more computational power.

On the other hand, there are lots of computational tasks where it really would be nice to be able to do more ops per cycle.

skaevola · on June 18, 2013

I guarantee the things you are thinking of are implemented in microcontrollers, not FPGA's. FPGA's aren't even that good at state machine type logic - they excel in parallel tasks which make the most use of each (longer) clock cycle.

chinpokomon · on June 18, 2013

Similar to the NAND to Tetris article posted earlier, when I went through college, we built a microcontroller using Altera EPROMS. I had been programming for years prior, and understanding how the ALU could be performing many simultaneous calculations only switching the output based on the opcode, completely changed my perspective. The newer FPGAs sound pretty nice and significantly more capable than what I've worked with. Definitely something I'll keep in mind.

vasco · on June 17, 2013

There's a lot of good information in the article for software people for whom hardware design is strange but I think the author makes some strange points. For starters, its strange to have him refer to Verilog as a programming language multiple times all over the article. Anyone that has ever tried to do even minimal stuff on hardware will know this the wrong way to think about things.

Verilog, VHDL and whatever other hardware description languages should not be approached as programming languages, or you'll have a bad time. You need to think about what you are generating. By using HDL's we have the ability to avoid messing with crappy interfaces where you drag chips and connect wires by hand, but ultimately you really need to know what's being generated. If you write a couple of lines and have no idea of the hardware behind it, you'll likely be making a lot of mistakes. Everyone that has tried messing with FPGA's without thinking about this has ended up with hundreds of generated latches among other niceties... There's already some degree of syntactic sugar on VHDL processes (or verilog @ blocks) which make it really easy to shoot yourself in the foot by abstracting stuff.

Ultimately you'll want to know what's going on rather than not. Hardware is hard, but when you need the performance you'll do it the right way, or you might as well save yourself a lot of trouble and just go with a beefier machine and some optimized C code.

yosefk · on June 17, 2013

I agree that you can write horrible Verilog by virtue of not understanding of what it compiles down to; but you can write horrible C because of not understanding how it compiles down to just as well, not to mention C++, not to mention languages where everything costs a ton asymptotically (like copying lists all the time - list(reversed(values)) in Python, etc.)

With hardware some people see it as particularly preposterous because of just what it is that you're wasting; but it's not that different, really. In my view for instance C is overly high-level because you can do ptr-ptr2 and it divides by sizeof and you have a division sitting in there that you don't see. Well in Verilog you can do an x%y and it synthesizes, and I worked on chips that went into mass production with this idiocy instead of using the fact that y was a power of 2. But it works just fine because it's just one piece of idiocy in a big design which is not all made of idiocy.

The upshot is, I said in there that some people will consider my perspective a tad strange, but yeah Verilog is a programming language :) with all the usual virtues and vices of one.

scott_s · on June 17, 2013

The difficulty I had was not writing horrible Verilog, but wrong Verilog. My mental model for computation was so defined by store-program computers that it took me a long time to build the correct mental model that let me write correct Verilog. And, as the parent poster pointed out, that very much involved thinking about physical things, being statically laid out.

My experiences: http://people.cs.vt.edu/scschnei/ece5530/

makomk · on June 18, 2013

I know that on Xilinx's FPGA toolchain, x%y cannot be synthesized unless y is both constant and a power of 2, in which case it just takes the n least significant bits as though you'd written that in the first place. Are you sure the tools you were using don't do the same thing?

makomk · on June 17, 2013

HDL development is kind of weird in some ways compared to conventional coding though. For example, the most portable way to make use of the SRAM hard macros that all modern FPGAs have is like this: you write HDL that describes how the SRAM behaves, and if you describe it in the right way it's replaced with actual SRAM through some magic of the synthesis tools.

yosefk · on June 17, 2013

This is much worse in ASIC, where there's no standardized SRAM interface and you have built-in self-test wires coming in that uglify each non-standard interface tremendously.

HDLs have their share of warts - pragmas in comments, vital information in non-portable synthesis scripts, etc.; well, C has its own pragmas and vital info in linker scripts and stuff as well... Verilog is in some ways much prettier because for instance it compiles down to itself - the output of synthesis tools, a netlist, is Verilog source. In C the closest approximation is int main[] = {non-portable hexadecimal representation of machine instructions}.

ajross · on June 17, 2013

I'm curious about the ASIC comment. Surely no one is "synthesizing" SRAM in ASIC by describing its behavior with an HDL, right? You just decide at design time to use an SRAM array of whatever size, then (for simulation) just stub it out with an appropriate HDL model that gets replaced during synthesis. The "SRAM" abstraction in that model is provided by the semiconductor fab, no?

At least, that's how I've always assumed things work.

yosefk · on June 17, 2013

It works as you describe, pretty much, the gnarly bit being that there's no standard interface for an SRAM (that is, SRAM is not in the standard library, unlike say simple gates.) One reason is the many different ways to implement BIST and connect the BIST signals from all memories in the chip into a coherent whole.

cnvogel · on June 18, 2013

Here's an article (not written by me, just found by using google) that has an example how the VHDL/Verilog compiler infers to use a dual-ported SRAM from code that describes the behavior, including the bugs you'll encounter.

http://danstrother.com/2010/09/11/inferring-rams-in-fpgas/

rockdoe · on June 18, 2013

Verilog is in some ways much prettier because for instance it compiles down to itself - the output of synthesis tools, a netlist, is Verilog source.

This is a terribly lame analogy. You could have the C compiler output C that exactly resembles what the actual CPU instructions do. And even write an assembler for that if you wanted.

Nobody does this because there's no real point.

kragen · on June 18, 2013

I think it would be absolutely awesome if the assembly output from my C compiler were syntactically-valid and semantically-equivalent C. Among other things, it would enable me to compile it (with a C compiler rather than an assembler) to run on a different CPU architecture. And then I'd just need a disassembler to generate it in the first place.

Unfortunately this isn't possible because you can do a lot of things in assembly that you can't do in C. RET, say.

schabernakk · on June 17, 2013

yup. I worked on a simple game AI (Connect6) for a university project implemented in Verilog. Initial prototyping in python was super easy and quickly done, translating the same program to Verilog took multiple weeks. We split the program up into multiple modules with each module handling different cases. While writing the program we only compiled it with the modules we were currently working on and even that took long enough (remember: we were trying to more or less directly translate the python program to verilog with a bit more global state). Once we finished we tried to compile the whole program with each module enabled and it compiled and compiled and compiled. We aborted after 20 hours of compiling and no end in sight, the design we were using was just not suited to be used with hardware.

Now, this was the first time ever I worked with hardware and while I really liked having something tangible and all those nice blinking lights on the fpga board, programming one was a pain in the ass. Towards the very end of the project I slightly got the hang on how you should approach writing such software. And the most important thing is to not think like you would when programming traditionally.

Designing and later implementing a FSMD for example is far more efficient.

paraboul · on June 17, 2013

Let me put this here : https://github.com/milkymist/migen/tree/master/migen

wtracy · on June 17, 2013

Some documentation or even a basic explanation of what this is would be helpful.

jontas · on June 17, 2013

You can find that here:

https://github.com/milkymist/migen

(one directory up from the OP's link)

sporkologist · on June 17, 2013

> Migen (Milkymist Generator)

a Python toolbox for building complex digital hardware

jwr · on June 18, 2013

The problem with FPGAs is that they are a niche technology that is unable to exit its niche.

As someone who used FPGAs in a hardware design and regretted it later: there are two major problems with using an FPGA:

* it is too expensive for cost-sensitive devices,

* moving data onto and out of the chip is and always will be a problem.

Now, as to #1 above, you will hear FPGA enthusiast say that prices are just about to fall as more and more devices are produced. But I've heard such claims since at least 1998 and so far this simply hasn't happened.

As to #2, the number of MAC operations is irrelevant if you can't supply the chip with data. This is something even modern GPUs have trouble with, in spite of the monstrous bandwidth of the PCI slots they are placed in. Most algorithms will need to move data in and out, and use intermediate storage. Once you do all that, you end up with an expensive and complex design, that doesn't perform all that well anymore, and that is a pain to program and debug.

Also, if what you're doing is high-performance computing, then you have to compare the expenses with just getting a larger AWS instance or a cluster.

As a result, I tend to advise younger programmers to learn about FPGAs, but not become obsessed with them, as they are very rarely the right tool for the job. I'd even say they are almost never the right tool, except in very rare circumstances.

yosefk · on June 18, 2013

With modern FPGAs, you have the CPU on the same chip, which means you can use just one chip and then it's OK for cost-sensitive devices - and your problem of moving data across chips is gone as well.

I'm not an "FPGA enthusiast" myself - I work in chip design and Xilinx is a direct competitor. I think FPGAs have their share of problems preventing their adoption and I'll write a follow-up to elaborate. But the problems you mention are between solvable and solved.

(BTW it's true for GPUs just as well: just integrate them on the same chip, and forget about the PCI slots. Every high-end to mid-range cell phone has done it a long time ago.)

rockdoe · on June 18, 2013

With modern FPGAs, you have the CPU on the same chip, which means you can use just one chip and then it's OK for cost-sensitive devices - and your problem of moving data across chips is gone as well.

The speed of the typical CPU on board an FPGA is ludicrously slow compared to the amount of data the FPGA itself can process.

BTW it's true for GPUs just as well: just integrate them on the same chip, and forget about the PCI slots. Every high-end to mid-range cell phone has done it a long time ago.

They pretty much have the same problem, exactly as OP stated. Nobody's using mobile phone GPUs for algorithmic acceleration. Hell enough of them have huge weak spots in doing texture upload/download themselves.

wcunning · on June 18, 2013

They're not all that much slower. Take a look at http://zedboard.org/. That's a Cortex A9 dual core proc attached to a hell of a lot of FPGA fabric. In general, I would expect that the speed of data "out" of an FPGA is going to be slower than the speed of data "in" to an FPGA, as well as being somewhat less data "out." I would expect to pull in a lot of ADC data and then filter and process and etc., only sending on what I absolutely want to keep.

dllthomas · on June 19, 2013

Sort of a microscale map-reduce.

yosefk · on June 18, 2013

I'm not saying that you can or should do this or that right now with a given piece of hardware, just what is and is not possible. The CPU on the FPGA's die could be fast, embedded GPUs could be better at algorithmic acceleration (here - wait for a few years, they have good stuff in their pipeline), OpenCL drivers could be easily available (of course portability would still be a real problem) and people could then program embedded GPUs more easily, etc. I've personally been developing accelerators integrated onto the same die as a CPU for a long time and it works nicely enough.

digikata · on June 18, 2013

FPGAs are also stuck in a lot of niches. Economically, any high volume application where they could useful also means that putting an investment into more custom silicon is then feasible to get a better price margins - ASICs, DSPs, or I recently learned a bit about metal-gate arrays which seem to be mid-way between FPGAs and hard silicon designs.

They're stuck in a power niche because they're comparatively high power vs many embedded ICs, and if you're using a lot of power, as you said more desktop chips are economic to apply.

ChuckMcM · on June 17, 2013

Nicely done, if you add a bit about clock distribution it would be perfect. Clock distribution is important because sometimes you want all the parts of your FPGA design to wait until a particular time to allow for various gate delays etc before they do the next thing. You can synchronize to a "global" clock but if you only have one global clock then you may be limited on how much of the circuit can be in your FPGA if other parts need a different clock.

That said, as FPGA vendors get closer and closer to an efficient mix of hard and programmable gates, their utility gets higher and higher. That increases volume and helps get prices lower. I've mentioned the Zynq-7000 (which I'm playing with using a Zedboard[1]) which is dual ARM9 cores and an FPGA which can drive several HDMI 1080p/60 displays. Other systems with fully "soft" CPUs use definable instructions to optimize code execution but that hasn't been as useful as I expected. Back when Xerox PARC was building the 'D' machines it was great fun when a new release of Mesa microcode came out because everything would get faster or more compact (rarely together though ;-)

[1] http://www.zedboard.org -- of course I'm currently fighting a tool/license issue with the Xilinx tools but once that gets sorted I'll actually be building new designs on it.

nwhitehead · on June 17, 2013

This article reminds me how much fun it is to program an FPGA. It's one of those mind-expanding exercises like learning logic programming or writing a purely functional persistent data structure. "Describing hardware" is quite a different way of looking at computation.

In case you want to try, I had good success playing around with the Altera DE2. It's got lots of "goodies" on the dev board and there are lots of existing class projects, class notes, etc. that use it. It's fun to get your own tiny CPU working and have it flash HELLO WORLD on the LCD display. You have to worry about bugs in your CPU as well as bugs in the program running on your CPU.

rwg · on June 17, 2013

The downside of the DE2 boards is that they're really stinking expensive if you don't qualify for the academic pricing.

I'd love to have a go at Verilog and/or VHDL, but I'm turned off by all of the sub-$100 FPGA dev boards because they basically just break out all of the FPGA's I/O pins to 0.1" headers and leave doing anything useful (as far as I/O goes) as an exercise for the reader.

daniel-cussen · on June 17, 2013

There's the Mojo at kickstarter.

http://www.kickstarter.com/projects/1106670630/mojo-digital-...

duskwuff · on June 18, 2013

…which only has a single button for input, eight LEDs for output, and a couple comm lines with the AVR. Everything else is just broken out to headers.

8note · on June 18, 2013

Design wise, a $100 FPGA board with a bunch of peripherals won't actually be able to run many of them at once.

I have however designed a ~ $150 or so board (for prototyping) that has 10 bidirectional serial ports, a fast USB 2.0 port, and 20-30 general IO on 2mm headers, and runs on either a high or low precision 27MHz clock.

The rev 2 one also has an ethernet port, a TiWi module, and a RAM chip, and a better/cheaper clock, but those features add on a bunch of cost. Connecting the RAM makes for a much more expensive pcb.

snom380 · on June 17, 2013

How about this? http://dx.com/p/fpga-ask2ca-8-diy-learning-development-board...

zhemao · on June 18, 2013

You could also try the DE2's "little brothers", the DE1 and DE0, which cost considerably less but have less logic elements, DSP blocks, and peripherals.

Jemaclus · on June 17, 2013

It'd help if someone defined FPGA before the 12th paragraph... I like to learn things, but when you start out assuming I know what the acronym means, then you lose me before I get to the good stuff. :(

whatshisface · on June 17, 2013

Field-programmable gate array.

Jemaclus · on June 18, 2013

Yeah, I got that at the 12th paragraph. :(

csense · on June 17, 2013

Question for people familiar with the Parallella project [1]: I see the board design has an FPGA. If I buy a Parallella, can I take the Verilog source code in this article, run it through some toolchain, and actually execute it on the board?

[1] http://www.parallella.org/board/

vidarh · on June 17, 2013

You should be able to. The SoC used for the Parallella is a Xilinx Zynq-7020 which is available on various FPGA dev boards too.

The Parallella will use some of the FPGA for various glue logic, but the code in question either is or will be open sourced - they're being very good about opening up whatever they can (since their goal after all is to get people to build stuff that'll drive orders for their chips going forwards)

Symmetry · on June 17, 2013

I've heard that there's been some progress in compiling Haskell to Verilog, but I don't know how far that effort has gone. Certainly more declarative languages seem much more suited to this than imperative ones. I have no idea how practical this is at the moment, though:

http://cufp.org/conference/sessions/2012/peter-braam-paralle...

amirhirsch · on June 17, 2013

http://www.bluespec.com/ was based on Haskell. Polymorphism is awesome for FPGA design: it is incredibly satisfying to quickly turn a fixed-point FFT to floating-point and see the area/speed/power difference from the FPGA synthesizer

tim_hutton · on June 17, 2013

More context about how FPGAs compare with GPGPU: http://blog.streamingcores.com/index.php?/archives/20-Progra...

vishal0123 · on June 18, 2013

> No need for a full cycle for simple operations: on FPGAs, you don't have to sacrifice a full cycle to do a simple operation, like an OR, which has a delay much shorter than a full cycle. Instead, you can feed OR's output immediately to the next operation, say, AND, without going through registers. You can chain quite a few of these, as long as their delays add up to less than a cycle. With most processors, you'd end up "burning" a full cycle on each of these operations.

This may be a dumb question, but how do FPGA execute multiple NON-PARALLEL simple instruction in 1 cycle. I always thought a cycle is undividable and can be used only for a single instruction.

bhb916 · on June 18, 2013

That's not a dumb question at all. The answer is simple: by cramming for logic in between registers. Since you (or at least the tools) are in complete control of the routing, you can place as much logic as you would like between registers, thus accomplishing everything in one clock cycle. The trade-off, however, is that every level of logic adds delay to the path and you can't clock your registers any faster than the sum of all the delays allows (with some exceptions).

This simple diagram sort of demonstrates the concept:

http://m.eet.com/media/1176731/sync-vs-async-01.gif

e12e · on June 17, 2013

It's interesting to see how this fits with Chuck Moore's idea of miniature Forth computers:

http://www.greenarraychips.com/home/products/index.html

While not equivalent to FPGAs it does seem like a natural direction if what is desired is similar performance, but with more higher level tooling. That eval board/chip is for instance able to drive VGA using parts of the resources, leaving the rest for other stuff:

http://colorforth.com/video.htm

yosefk · on June 18, 2013

Verilog is higher-level than a bunch of nodes talking only to their neighbors and having a tiny amount of RAM, by far...

kragen · on June 18, 2013

I think the key difference is that there's already a whole lot of stuff available in Verilog from opencores, and basically nothing for the GreenArrays chips beyond what comes with them. It might be feasible to compile Verilog to the GA chips, but sadly, they don't seem to be working on that.

But I don't think you can so easily make a "higher-level"/"lower-level" distinction between the two. On one of the GA cores, you could quite reasonably execute a recursive algorithm — as long as it doesn't need more than 64 words of RAM — in a few lines of code. You can do runtime code generation. You can execute extremely irregular algorithms almost as easily as extremely regular ones.

But I think their failure to be compatible with anything else has doomed them.

InclinedPlane · on June 18, 2013

Just wait. When memristor technology matures it should be possible to create FPGA-like devices with feature density and clock speeds similar to ASICs (e.g. gigahertz, gigatransistors) with reconfiguration times on the same time scale as memory writes. That ought to cause some fairly significant ripples in computing.

kragen · on June 18, 2013

Hmm, I think memristors failed. The devices HP is promoting as "memristors" now have nothing to do with their original memristors. I don't know why. Maybe the devices started to fail after 100 write cycles or something; totally plausible for physics that depend on ion migration. Maybe someone will invent a new device that does what their original memristors did, but in a form that can be brought to market.

marshray · on June 17, 2013

FPGAs aren't going to take off until there's an open source toolchain to program them.

Sure, vendors provide free-as-in-beer licenses for their smaller chips and one-off development boards. But the generally useful version will set you back $2 - $3K.

astrodust · on June 17, 2013

This has been the biggest pain point for me trying to use the MojoBoard (http://embeddedmicro.com). It's a decent FPGA, but the tooling is entirely reliant on the Windows or Linux ISE program and you're almost entirely dependent on the IDE. The "loader" is disjointed from the compiler, too, as the loader is board specific.

If there was an open-source FPGA core with an open-source compiler, it'd be a lot easier to work with these things.

mncolinlee · on June 17, 2013

GPGPUs have been a lot more interesting with the rapid pace of development in recent years, but there are still software optimizations that are only feasible on an FPGA.

They're one of the most interesting pieces of hardware a programmer can use to develop high-performance hacks.

When I worked at Cray, our team found numerous ways to make them useful. I've been meaning to pick one up for a long time.

zw123456 · on June 17, 2013

I think the author makes a number of good points. I think in the future, it may be possible for machine code to be converted directly to RTL (the equivalent of machine code for FPGA's) without going through the intermediate steps of converting C code to a multi-state machine in Verilog or similar types of coding. If this happens then a CPU or OS, could use an FPGA as a Co-processing Cache of sorts, in the same way that often accessed data in placed in higher speed cache RAM; often executed pieces of code could be placed in the FPGA co-processor. If that happens, I believe we could see the same kind of order of magnitude jump in performance we see due to memory cache happen in processing speeds.

taylorbuley · on June 17, 2013

Here's the FPGA from Adafruit: http://www.adafruit.com/products/451

At the time they wrote "we searched around for a fpga board we really liked, here's the one we picked" (https://plus.google.com/+ladyada/posts/VZ1zyTF73oM)

Here's an example demo using that hardware: http://learn.adafruit.com/fpga-rgb-matrix/overview

fnordfnordfnord · on June 17, 2013

Source for that is Terasic http://www.terasic.com.tw They do reference designs for Altera, and have quite a few neat dev boards.

micheljansen · on June 17, 2013

I think the increasing popularity of functional programming or at least less imperative-style programming is making FPGAs more accessible.

I remember my first brush with VHDL during my first year of CompSci and having a hard time adjusting to the fact that what you are writing is not a program that is executed step by step (whether line by line or instruction by instruction), but a description of hardware components/functions that all "run" simultaneously.

Now that my brain is more comfortable thinking in terms of functional solutions, this all makes a lot more sense, but I haven't revisited FPGA programming since.

yosefk · on June 17, 2013

Functional style indeed works fine with simple data flow graphs, but once you start fiddling with RAM you're back to imperative nightmares from hell (unless you're wielding a sufficiently smart compiler that I haven't seen, of course.)

swamp40 · on June 18, 2013

Somehow managed to get thru 20 years of hardware design w/o yet using an FPGA.

Keep looking at them (esp. the IGLOO series), but keep finding some better suited or easier to implement alternative.

Jach · on June 17, 2013

MyHDL plug: http://www.myhdl.org/doku.php Don't use Verilog or VHDL, use Python and compile to both while avoiding Verilog's pitfalls! http://www.sigasi.com/content/pitfalls-for-circuit-girls

DennisP · on June 17, 2013

This may be a dumb question, but would FPGAs be a good choice for neural networks? Or is a GPU already ideal for that?

kyzyl · on June 18, 2013

Both can be made to work quite well for various ANN implementations. Currently GPGPU is a lot more popular because it requires far less hardware knowledge to get going, which translates into much quicker development cycles. For example, the boom of deep learning which has been going on since ~2002 has been largely couched in the massively parallel training of neural networks on GPUs. Some clever folks have written libraries like Theano, which allows you to create computation graphs in python that can be compiled, targeting either CPU or GPU.

In principle FPGAs can be an extremely attractive option, because a well designed implementation could run very fast as scale very well, given that their massively parallel architecture lends itself to the parallel nature of ANNs. A few groups have gone to lengths to implement, for example, a Restricted Boltzmann Machine architecture directly on FPGA, and in raw benchmarks it works well. However, you start to run into practical limitations if you try to apply those archs to real-world problems, because all but the most expensive FPGAs have trouble fitting very large neural net architectures on-chip. Furthermore, the data required to train the nets is often too big to fit in RAM on the chips, meaning you're left with network or SD-card type approaches, which can be cumbersome and slow. I can recall at least one instance where the design explicitly addressed this issue by designing the architecture to be distributed over many hardware linked FPGAs, but even then you still run into the usual parallelism and concurrency issues: you need to balance parallelism overhead against speedups, you run into synchronization issues, and things can become non-deterministic and very difficult to debug.

The bottom line is that it doesn't look like neural network FPGAs or ASICs are there yet. I'm sure there will be more exploration of this space in the future. As it stands right now, neural network training is dominated by GPGPU implementations.

sporkologist · on June 17, 2013

Looks like they scale easily in parallel from what I read.

Here's a book on it: https://www.springer.com/engineering/circuits+%26+systems/bo...

yosefk · on June 17, 2013

There are academic people mapping neural nets on FPGAs; I'd expect FPGAs to do better if you manage to use integer arithmetic.

namank · on June 17, 2013

Programmers don't give a crap about the reasons mentioned in the post. So, if FPGAs do take off in the dev community, it'll be for different reasons.

Such as when 80% of the world's code can be, with accuracy, can be parsed into HDLs.

VikingCoder · on June 17, 2013

I'm a programmer, and I care about delivering great performance at reasonable price to my clients. If I had a set of problem in an application where an FPGA would benefit the customer, I'd seriously consider targeting one.

namank · on June 17, 2013

Would you trade off developer time to figure out how many registers to use, how many pipeline stages you want every single time you need to write code? How many of your developers are apt at figuring that out? And that's a decision you have to make every time you want to add a feature because FPGA architecture is, more often than not, dependent on the business logic you want it to encapsulate. It's cheaper to get an Arduino instead.

FPGAs look very attractive when you're used to comparing C#/Python/Java/everything to C but that fades away as soon as the complexity in managing code on FPGAs comes to light.

All the points mentioned in the article under "performance"? It's left up to you, the developer, to configure the board in such a way that you can leverage them. And that's before you've written a single line of the business logic.

In theory, FPGAs give you the power to code your business logic as circuits (making it monumentally faster) but in practice, the story is very different. You may want to look at High Frequency Trading - they use FPGAs for stock analysis there.

VikingCoder · on June 17, 2013

I worked at one of the first companies to ship a program with OpenGL ARB Fragment Shaders, so yeah, we traded developer time for customer performance. Managing the code was an utter nightmare, and we developed a ton of custom tools to try to tame that beast. If we'd had something like LLVM available at the time, it would have been a godsend.

And yes, getting performance out of the video card was left up to us, and that was a beast.

I'd think the single most important factor would be the data you had, and the operations you needed to perform on them. If your business logic was amenable to FPGA, then it might make sense...

whatshisface · on June 17, 2013

When you need an FPGA, you need an FGPA. I don't think anyone would spec out one gate at a time if C and an operating system could cut it.

sliverstorm · on June 17, 2013

For example, I once purchased an oscilloscope that could measure up to ... I forget, but it was probably either 200MHz or 500MHz. Anyway, it was a hobbyist scope, so the manufacturer could not afford custom silicon. Nor, I presume, could the speed requirements be met by a DSP chip. They used a microprocessor mated to an FPGA to do signal capture & processing, which was fed to the microprocessor.

namank · on June 17, 2013

But that's not the contention here; those applications are outside the domain of most programmers, including embedded developers.

sliverstorm · on June 17, 2013

"embedded" encompasses a host of different applications- many of which either could benefit from an FPGA, or might benefit from prototyping on an FPGA for custom silicon.

namank · on June 17, 2013

Somewhat true.

I think the OP is targeted at programmers en masse. So, if we were to pick majority of embedded programmers and see if their application would benefit directly from an FPGA, I'm fairly certain my previous post would come in to play.

upwardbound · on June 19, 2013

I am a programmer and I definitely give a crap about the techniques taught in the post. I believe we can use FPGAs to implement zero-latency "chasing the scan" augmented reality.

ctdonath · on June 18, 2013

Identical articles were common 15 years ago.

ChikkaChiChi · on June 17, 2013

Are they referring to FPGAs in a mass-production/manufacturing sense or for the hobbiest/garage hacker?

I guess I'm unsure why I would use an FPGA over say an Arduino microcontroller.

VLM · on June 18, 2013

"I guess I'm unsure why I would use an FPGA over say an Arduino microcontroller."

Its unfortunate no one replied. First you upload an arduino reference design into the FPGA and continue working normally. This takes up about 1% of the FPGA gates, and leaves about 200 unused I/O pins (depending on FPGA device of course) and it probably runs faster than the real thing. (It also probably draws about 10 times the current...).

Next, instead of you writing a bit-banged I2C interface, you download, include, configure, and synthesize a smart semi-intelligent I2C peripheral into your synthetic arduino on a I/O pin that doesn't exist in the real thing. Now your arduino source code to debug shrunk by perhaps 200 lines and the I2C is faster. Now your FPGA is 2% full and 98% empty.

Hmm, what if instead of polling that pin in my arduino and/or going interrupt based, I simply included a whole picoblaze CPU core to do nothing but baby sit that input pin while outputting the correct servo signal... And the arduino sets it up and controls it as a very smart peripheral. Well, download, include, configure, synth, upload... Now your FPGA is like 7% used and 93% empty.

Eventually your virtual arduino does pretty much nothing but UI. You can replace that with a CPU core of your choosing and OS and language of your choosing, simply upload it instead of the virtual arduino.

Repeat a zillion iterations and thats how you started with an arduino and ended up with a MIPS UI written in C++ (why? Why not?) controlling a 8051 based thermostat coprocessor using a smart I2C controller to talk to the temp sensor and a smart servo pulsetrain generator to automatically make the servo signal.

Or perhaps it gradually iteratively moves from an arduino to what amounts to a specific model of the MIPS based Microchip PIC32. Why? Who knows maybe you want a single chip $3 solution instead of an $30 arduino solution, or a $300 FPGA solution. Because your FPGA model of a PIC32 is pretty close to the real thing, the porting of your software from the virtual PIC32 to a real PIC32 only takes 15 minutes or so... So you end up not actually using a FPGA in the final product even though it started with an Arduino and ended up on a PIC32.

dchichkov · on June 17, 2013

If you want to get your hands dirty, there is a pretty good eval kit available at Lattice, $100 gives you a PCI express board with GbE port, 1GBit DDR3 1330 and software...

sliverstorm · on June 17, 2013

This?

http://www.latticesemi.com/products/developmenthardware/deve...

alecdibble · on June 17, 2013

I believe the next big jump in FPGA utilization will involve high-level synthesis. Verilog and VHDL are essentially analogous to assembly languages in terms of abstraction.

kryten · on June 17, 2013

I was using Xilinx FPGAs in 1998 and they were saying next year then for such a thing.

It's 15 years later and it's still the same with shitty overpriced tools, per-vendor ghettos and crazy per-unit prices. Granted we've got a couple of ARM cores instead of a single PPC 603 and a clock speed ramp but that's it.

If they snap out of that crap, perhaps, one day we'll reach utopia.

Then again, fucking rich engineering companies over for wads of cash is better for business.

sliverstorm · on June 17, 2013

I don't know that it's by choice. The market is small, and the problems are not easy. Software compilers have had tons of effort, money, and mathematical knowledge thrown at them for the past several decades. HDL-to-FPGA synthesis and place/route on the other hand has been driven by only a few companies, and hasn't been around nearly as long.

alecdibble · on June 18, 2013

It's definitely a chicken-and-egg problem. All I meant by my comment is that I can't see FPGAs taking off if people are stuck connecting together IP or using Verilog/VHDL. It's just too low level.

anon859486896 · on June 17, 2013

I think that is true, but I think there is another problem as well. There is no, to my knowledge, equivalent of GCC for an FPGA, and by that I mean a free, open source, cross platform synthesizer. You are stuck using their tools, and they are generally suboptimal at best when it comes to the IDEs at least.

I think one of the main problems is that it is the FPGA manufacturers that also provide the toolchains, and that is your only option.

Edit: Also, forgot to add. I am not sure FPGAs are that ameniable to using higher level languages simply because you can make a trivial adjustment to your code and resource usuage will jump from 1% to 27%, which is not trivial. It used to be the same with your normal CPUs, every cycle counted, this will improve with time of course.

swah · on June 17, 2013

That will happen at the same time PS3 games are written in Haskell instead of C++ :)

guiomie · on June 18, 2013

What would be a good approach to implement a web server in an fpga ? (verilog or vhdl? xillinx or altera? ...etc) It is something I have been thinking a lot lately.

yosefk · on June 18, 2013

Run on the on-chip CPU :) (Which bits would you want to implement in Verilog? As to Verilog vs VHDL - Verilog, always :)

VLM · on June 18, 2013

Correction that I'd use the on-chip CPU and make the on-FPGA ethernet controller smarter and smarter over time until the on chip CPU isn't doing much other than telling a terrifically smart TCP accelerating ethernet interface to look at a certain array every time someone connects to port 80.

Doing that in one jump would be pretty dumb. I'd start with FPGA "hardware" acceleration of ethernet header checksum. Then start accelerating the IP headers. Then start accelerating the TCP headers.

upwardbound · on June 19, 2013

Maybe you'd want to write a super-fast template language evaluator. And give it the ability to call database queries.

ChrisNorstrom · on June 18, 2013

(By the way, FPGA = Field Programmable Gate Array)

_h8ft · on June 18, 2013

I bought one, never got around to using it, and sold it a few years later. Like a lot of cool technology, it will go underused due to difficulty of use / lack of good, easy-to-use applications. And for all the things I want to do, it's easier to just write software.

twasfm · on June 17, 2013

This is written by the C++ FQA guy which is an amusing and honest look at C++

VLM · on June 18, 2013

There are aspects of the whole technology that were avoided in the article, or edited out for length, or whatever. Regardless of reason, here's some filled in blind spots:

One important ignored aspect is you can treat the FPGA as a magic universal microcontroller. Oh, you need 3 UARTs now, I'll INCLUDE another at a different I/O addrs. And you'd like a CANbus now? No problemo in 10 minutes you'll be programming on the same hardware now with a CANbus. Oh you don't want to use a Z80 anymore, fine I'll compile the FPGA to be some pic variant in 5 minutes, or a microblaze, or whatever you want as a CPU core. As long as you slowly successively approximate real available microcontroller hardware, you can ease the eventual port off the FPGA to some microcontroller hardware that costs $1.50 in qty instead of $15.00 in qty. Its a lot easier, faster, and cheaper to upload a bitstream to a FPGA than to solder in a different microcontroller hardware. This makes device selection during development much less critical/scary. You don't have to sit in paralysis wondering if you need 3 or 4 I2C busses.

Another thing is not mentioning the open development community. Most HN readers probably hang out at github. The place where most FPGA folks hang out is opencores.org. People who exclusively hang out on github are going to wonder why "no one" is using FPGAs and theres "nothing out there" and "not much activity". Well come on over to opencores and you'll get more than a lifetime of entertainment ahead of you, for free... Its like going to CPAN and hanging out with CPAN people and wondering why "no one" uses Python. Looking in wrong spot. Need a more target rich environment.

You can do a lot of "stuff" with FPGAs as the underlying technology without doing much of anything with HDLs. Much as I once programmed on a Silicon on Sapphire CPU a long time ago, but the underlying technology simply didn't matter at the assembly language level. I'm sure it was very exciting on the factory floor and in the R+D labs, but I wasn't there, so I didn't much care. In a similar way you can synth up a perfectly good 6502 and give it twenty semi-intelligent ethernet interfaces without too much HDL effort and then just do your "software thing" on the synthesized device without touching the HDL.

Tying it all together, you'll get people outside the community reinventing the wheel, trying to slowly and methodically invent the concept, purpose, and implementation of the WISHBONE interconnect bus standard or something like that. Well, all that stuff got figured out and implemented inside the community like a decade ago, so... Much like the old saying about people refusing to use unix inevitably end up poorly reimplementing it.

xradionut · on June 18, 2013

Software Defined Radio is a good FPGA application: http://www.srl-llc.com/

mrmagooey · on June 17, 2013

Now someone should open source a verilog tcp/ip and http stack, and maybe benchmark it against something like varnish.