Hacker News new | past | comments | ask | show | jobs | submit login

Moore's law is quite amazing.

I believe there are 4 reasons why GPUs have taken off conventionally as compared to FPGAs:

1. Last I checked, the FPGA vendors will not open their toolchains up, and not even document the bitstream formats. They will claim NDA, proprietary, etc. This has the massive side effect that you are stuck with their bloated, slow, crappy toolchains. If this were open, I guarantee hackers would be inventing all kinds of interesting ways to convert their software into FPGA bits.

2. FPGAs are VERY hard to write and debug. You have to write your design in an HDL language (either VHDL or Verilog), and you have to use a software simulator to prototype the design on first (and of course these tools are either quite pricy or if free they are usually limited or hard to use). Then you can synthesize the design and download it into the FPGA for running.

The next problem is debugging your design. The entire internal state of the FPGA is only accessible through slow scan, unless you dedicate a portion of your design to "monitors", which tap the traffic and store their values into internal RAMs. So you may have to respin the design just to get more monitors to debug where the issue is.

3. FPGA compilation is SLOW. When I used them professionally a few years ago, a Virtex5 could take multiple hours to resynthesize/place & route a medium-sized design. I believe that Virtex7 they are advertising could take over a day to respin if you change your design.

4. Most new machines already have a built-in graphics with a GPU that can be utilized as a general-purpose GPU. No one ships FPGAs in any conventional computer.




(2) can probably be addressed with OpenCL -- Altera seems to be working an SDK[1] that allows you to write C code which, as I understand it, would compile to an image that you could then program on to your FPGA (or you could just compile for execution on a processor). So fortunately, no Verilog or VDHL necessary.

(3) is another issue, but I don't think the consumer would necessarily need to worry about compilation. The developer would just include the compiled programming files for different FPGAs in the application.

If you mean that it'll be slow on the developer's side, that's definitely a valid point. I'm sure, however, that you'll see FPGA manufacturers start to move toward remote compilations so that you're not necessarily limited by the hardware you have in-house.

[1] http://www.altera.com/products/software/opencl/opencl-index....


More about (3):

Altera calls it "logiclock," Xilinx has a different term, but the idea is that you don't need to re-synthesize the entire FPGA for every change. In fact, you may not want to. If you are tweaking a certain region, you're usually happier if the place & route doesn't send your stuff through a route that then kicks off a line in another block so that the timings are now off in that other block.

For an FPGA the timing is how you measure performance and getting the best timings can take quite a bit of work. Being able to lock that once you've got it right is a big plus.


Xilinx's ISE calls it "SmartGuide".

It's kind of a mixed bag. It's worked okay for me if changes are truly minor, but if there are large changes to the logic it doesn't seem to be very good about "forgetting" what it learned from the previous pass. Three or four times this week I've had a design fail to make timing with SmartGuide, but work when doing P&R from scratch.


(2) is called High Level Synthesis. While a great idea, and quite practical, it does not lift you from the burden of understanding the FPGA. Generally you have to make your C code fit a very rigid format that compiles to a pipeline or similar - the advantage is that you can test it as C code, not that you take off the shelf code and run it on a FPGA. There would be no point to that - a hard processor will always be faster at running arbitrary C code.

In addition, most of these tools compile to HDL, so they only add to compilation time.


Regarding (3) remote compilations, Altera is lifting off the hardware burdens for developers with remote compilations on cloud. see cloud.altera.com for Altera's tool extension


Interesting points. However, it's a bit hard to complain about it being slow (#3) considering the enormous complexity involved, no? I remember compiling things on a 486 (when I got into Linux) being pretty damn slow as well for anything substantial.

I don't mean to excuse the manufacturers, but at the same time they seem to be selling into a pretty small market and it's not clear to me that opening things up will magically lead to a big expansion in chip sales that will negate the competitive risk of being the first to open up. If you have time, I'd like to learn more about this since you seem to have a lot of experience with this technology.


I am certainly not expecting compiling HDL to be the same as compiling software, as yes it is drastically more complicated. However, compile times are growing more than linearly with respect to the number of gates (or logic blocks) that are in each FPGA. Whereas software grows linearly as the programs get larger. Also you have the other dimension of getting your block timed, so it can run on the FPGA at a guaranteed frequency. This is definitely a non-trivial problem, but one that I believe would be better solved by some hackers if there were a "GCC" for Verilog FPGA synthesis.

In my opinion there is very little for the manufacturers to gain by keeping their bitstream formats proprietary and undocumented. I don't think there is a competitive advantage, as all the manufacturers are pretty much doing the same thing. And their FPGA block diagrams are already open and documented (you can see how many flip flops, clocks, and muxes are in each logic cell, how the routing works, and where the memory cells and other units are).


I have only passing familiarity with FPGAs, so perhaps you can excuse my ignorance.

I was under the impression that FPGA vendors often license functional blocks (like PCIe SERDES) to FPGA users. Might it be that part of the purpose of obscuring the bitstream format is to make it more difficult for customers to use those functional blocks without paying the toll?


The bitstream format, at least for Xilinx, is not that obscure. Actually was documented in an old application note.


Take a look at VTR (formerly VPR): http://code.google.com/p/vtr-verilog-to-routing/. It's an academically developed tool for doing FPGA place and route. At the end of the day, you'll still need to use the proprietary tools to convert to the appropriate bitstream, but this an open source solution for the "heavy lifting" portion. However, last I checked the solutions produced by VPR aren't as good as the commercial tools.


> However, last I checked the solutions produced by VPR aren't as good as the commercial tools.

Well, that's no surprise: FPGA vendors spend a lot of manpower on improving their Place&Route software. If you wanted to build something competitive, you'd need a lot of money plus access to proprietary, non-public, information.


Bingo. FPGA routing is NP-Hard; Compiling software is generally P.


If you take the travelling salesman problem, you can dramatically simplify the problem by constraining the salesman to visit all of the cities within the same state sequentially.

Similarly, you can reduce the complexity of routing calculations by applying some constraints. You will potentially lose the possibility of an optimal solution, but you will gain a far faster compilation time. As always with engineering, it's a trade-off.


Yep, global vs. local routers are kinda like stay-withing-the-state.

One of my favorite ideas on this is space-filling curves: http://www2.isye.gatech.edu/~jjb/mow/mow.pdf


a bit hard to complain about it being slow (#3) considering the enormous complexity involved, no?

It is if the original premise was to make Photoshop filters fast. A GPU can make my Photoshop filters fast now an FPGA implementation can make them fast 8 to 24 hours from now.


Not if you upload precompiled bitstreams into the FPGA.


Laying out components on a chip and routing non overlapping edges (ie wires) is call "orthogonal edge routing". Graph drawing algorithms don't get much attention outside their niche (oddly to me at least). But this is one area that has profound importance.


I've made a note of the term "orthogonal edge routing", hopefully for eventual incorporation into my own software (http://www.nitrogenlogic.com/docs/palace/). Thanks.


"You have to write your design in an HDL language"

What???? There is a very mature set of tool for converting MATLAB to HDL.

http://www.mathworks.com/products/hdl-coder/

http://www.mathworks.com/products/hdl-verifier/

http://www.mathworks.com/products/filterhdl/

I'm sure there are similar tools for other languages. It requires you to program in a slightly different way (certain operations aren't optimal for FPGAs), but is extremely user-friendly.

At the company I work for, all FPGA programming is done in MATLAB. Unfortunately I work in a different department so I can't give you any technical details, but from what I understand, no one has written things directly to HDL in years.


    Some people, when confronted with a problem, think
    "I know, I'll use MATLAB." Now they have two problems.
(Apologies to jwz.)


It depends on what you are doing. There are tools in the Xilinx tool set enabling you to design filters or other things in MATLAB Simulink, which are very convenient. But if you want to write a processor, however small, you cannot really do that well. Because the hard part is getting a good design with proper timing and synchronization between different parts. I think it's harder to get such a good design using MATLAB/C to HDL tools than it is designing directly in HDL.


The GPU vendors dont exactly have open toolchains either. OpenCL is not open in that we get to write assembler for the GPU... It is only a little better.

But GPUs got shipped in volume. I think they were just cheaper for the performance level.


Not sure how relevant it is, but isn't PTX available through LLVM?


LLVM has an open-source PTX backend, and newer versions of the official CUDA compiler use LLVM to generate PTX internally, but PTX is a device-independent intermediate layer, and the PTX-to-SASS compiler is closed-source.


2. HLS tools exist[0][1][2] to convert C to HLD for FPGA programming, these results are used in production designs. As @jangray says below, debug tools are sold by commercial vendors as a value-added capability for production teams; it's not a "freemium" market.

[0] http://en.wikipedia.org/wiki/High-level_synthesis [1] http://www.xilinx.com/products/design-tools/vivado/integrati... [2] http://www.synopsys.com/Systems/BlockDesign/HLS/Pages/defaul...


I agree with your reasons.

Also, GPUs might be a better match for the kinds of codes people care about. The world didn't need arbitrary bit-level computations except in rare cases, it needed insane memory bandwidth and high floating point throughput (exactly what a photoshop filter would need). The generality of most FPGAs mean they're not great for standard circuits that can be optimized. Maybe this means the FPGA market might see some success with a different trade-off of flexibility vs fixed hardware. The rise of FPGAs like the Zynq with dedicated processors or distributed RAM and DSP units is already happening.


1. is interesting. I always wondered if a completely open FPGA vendor would have any chance in the market.


I'm reminded of Viva from Starbridge which was supposed to make FPGA "programs" easier to build and debug - using large generic blocks.

I have no idea what happened to them, but I suppose the problem was harder than they believed or at least claimed.


It got bought by Data I/O and renamed Azido. I've used it briefly and it made me beg to go back to Verilog.


Wow. That really says something (to me anyway), begging to go back to Verilog.

There are a lot of pain points in the HDLs, but it seems like Verilog has more than the others.

I saw someone working on a clojure HDL, I think it might have compiled down to or emitted Verilog. I thought it was more confusing than the HDLs to begin with, but depending on one's background it might make more sense.


Probably only 4. matters.


I'm of the opinion that #2-#4 flow from #1. Fix #1 and the rest will eventually go away, so #1 is the one that matters.

My reasoning for each:

#2. More open tools would allow alternative programming models. For example, gcc already has a vhdl front end. Why not a gcc back end for an FPGA? That would open the door to more familiar languages.

#3. More open tools and specifications would allow programmers to start optimising and rethinking the FPGA compilation process, potentially leading to radical reductions in run times.

#4. People won't want FPGAs in their machines until they are easy to use. Solving #1 (and consequently #2 and #3) will make FPGAs easier to use, increasing demand, prompting manufacturers to consider including programmable logic in their machines. Granted its a chicken and egg situation between adoption and better tools, but opening the tools and specifications up could break the cycle.


#2: Intermediate representations for hardware are vastly different from intermediate representations for software. That's because the execution model for hardware is vastly different from the execution model for software. You'd need a Sufficiently Smart Compiler(TM) to convert from the latter to the former and get even the slightest amount of efficiency (in general – I'm not talking about specialized DSP-filter-to-HDL tools).

#3: No. The information needed for synthesis (HDL to netlist), mapping, and placing is publicly available. These topics are actively researched, yet so far no truly usable open source tool has emerged. Routing tools are not possible without the information available, though.

#4: Sure, solving #1-3 would make FPGAs easier to use, but #2 and #3 don't follow from #1 even if #1 was satisfied.


Regarding your comment on an "FPGA backend" for GCC, you have to understand that simulating a VHDL design (what is implemented) is a drastically simpler task than synthesizing an FPGA image. Logic optimization, place and route, timing analysis--these are things entirely out of the scope of the GCC project, and the details differ significantly between FPGA vendors and between an individual vendor's products. It just isn't a realistic goal.


Agreed, that it is outside the current scope of gcc. Given that gcc is able to handle a simulation, that would indicate that gcc's intermediate representation is able to capture the semantics of a VHDL netlist? That's where I'm starting from.

Assuming the above, I'm thinking of a project, independent of gcc, that takes gcc's intermediate representation and does all the FPGA specific tasks that you mention. Yes, it would be a huge project, comparable in scope to gcc itself, and even that might be an underestimate. It could start small, to make it realistic, then incrementally expand its scope, just like linux and gcc did. Eventually, the FPGA vendors might have to choose between participation or losing customers? It might be able to exploit some of gcc's backend infrastructure in the FPGA process, but who knows?

> It just isn't a realistic goal.

Or it's a red rag to a bull, to the right person. :-)


> #3. More open tools and specifications would allow programmers to start optimising and rethinking the FPGA compilation process, potentially leading to radical reductions in run times.

Including using an FPGA to accelerate that process -- probably possible since from what I know there is a lot of parallelism involved in synthesising logic.


> there is a lot of parallelism involved in synthesising logic.

But not the kind of parallelism that's fast on an FPGA.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: