(retracted, can't find exact part on digikey) Quantity 1 pricing is about $2000/chip ; however, that price would no doubt drop substantially at even a 1000 chip order. A typical discount (appears to be) 20% at qty 100; so at qty 1000, maybe 33%? So 1 of these chips might be $1200 after you figure a bunch of other discounts?
The XCVU9P is listed as having ~6800 DSP slices, 2,2586K (= ~2.5 million) of system logic cells. That pretty much matches what Amazon describes as them using.
(The DDR-4 RAM is external - so it doesn't help narrow down the device.)
However there are multiple, very similarly named parts, and it doesn't seem that the XCVU9P is listed on Digikey. Other parts named UltraScale are - thus the confusion.
These aren't the basic VU9x's. The specs put them somewhere around the highest end card on that series: the XCVU37P, with 64GB DDR4 and 2.8 million cells/7k DSPs.
That isn't a $2k card attached over fabric... It's probably closer to a $40,000 card.
I think the parent comment to yours (or at least the edited version) is right, they are almost surely XCVU9Ps based on the logic element and DSP counts. The ram numbers listed in the product table are embedded memories (BlockRAMs in Xilinx-speak), DDR4 wouldn't be a spec-feature of the FPGA as it's external to the part on the PCB.
Avnet web pricing is $27k for the -1 speed grade, extended (not industrial) with 12-week lead. Safe to say Amazon is getting a better deal on both counts.
Edit: Oh yeah, I forgot to add: I thought it was funny that Amazon's page referred to "logic elements" given that this was traditionally an Altera term - Xilinx preferred/prefers "logic cells".
https://aws.amazon.com/ec2/instance-types/, ctrl+f "F1". They're vague about the exact part number (none of the listed Xilinx parts perfectly match, so it's likely they got first run of these specific parts or something[1], the prices are just eyeballing based on the listed parts page), but the basic specs are there including the RAM, DSP and logic elements.
[1] Xilinx also released Vivado 2017.1 today which is required for these UltraScale+ Virtex devices, so I'm assuming this isn't a coincidence.
I thought they'd be in the thousands but $40,000!? Holy crap! Is there a good reference you have breaking down the families, sizes, and prices all in one place?
Not really, I sort of just ballparked it (pulled it out of my ass) based on the available numbers you can get from some vendors like Digikey, or Arrow. Go search out the UltraScale+ Virtex devices there, in individual quantities, and they'll range in the $20k space easily at the higher end.[1]
These particular cards seem to be otherwise unobtainable and the series isn't exactly listed anywhere I can see. They're clearly at the high end, close to those listed ones, though.
Plus, and someone correct me, because I could be very, very wrong -- those $20k numbers are only for the literal chips you get, I think, not counting the actual necessary "surrounding" device. You get surface mounts. So the actual connecting fabric and everything is going to drive the price up even higher.
That said, I doubt Amazon is obviously paying $30-40k a device... I bet they get absurd margins at these rates. Just the lowly individual purchasers would pay that price. (Aside from just literally buying in huge quantity, there's probably some element where they can say "We are Amazon, give us deals.")
That pricing sounds waaaay too low to me; these are big, cutting edge parts, basically unobtainium at the moment unless you have a close relationship with Xilinx. When they reach the channel they will not be cheap either.
I checked pricing for qty 1 via digikey.com in USD, choosing the part that was about $2k as it seemed closest to the specs I could find mentioned in the joint Xilinx/Amazon announcement .
You must have been comparing to something else entirely! The cheapest Ultrascale part that Digi-Key carries is XCVU065 at $6800, and that isn't even in the right part family.
Here's how it looks[1], with further links to the product down the comment chain. The picture I posted there has the exact product's picture with the correct specs. I don't remember the exact pricing but I think it was about $9k
Yeah, I too was mystified that they didn't bother to mention this on the landing page, or even explain what it is for newbies who are unfamiliar with FPGAs.
If you don't know what an FPGA is, you are probably not going to be able to do much with one. It's kind of like expecting the Ubuntu homepage to explain what Linux is, or the Django website to explain what Python is.
I agree with you. It's a complex topic that takes a long time to learn and use effectively. It costs plenty of money. We don't need a site on development kits aimed at practitioners to explain the basics of what they are. We have other sites for that.
Not for the entry level. The Icestorm [1] tools are free software, in every sense of the word, and can take you all the way from Verilog source (or VHDL using GHDL's analyser [2]) to a bitstream. An associated board, such as an iCEstick or iCE40-HX8K, costs $40 downwards. (Cheap enough that a friendly sales rep. gave me one for free on request.)
The iCEstick is probably quite useful for general-purpose hardware interfacing tasks, etc but I doubt you'll be doing much in the way of FPGA compute on one of those.
The project is about AWS's using high-end FPGA's. It's going to assume they have money and already learned the stuff since they pay as they go. Whereas, for a page on introducing FPGA's, what you referenced would be good stuff to have on it. Especially in addition to Spartan boards or list of cheap kits.
Just wanted to make it clear to the casual reader, who was considering learning about FPGAs, that "plenty of money" doesn't have to be the case. It would be a pity if we lost a newbie :-)
Now now, don't be talking about how things are fun and might appeal to newbies or children. Gatekeeping is Serious Business and is not a fit subject for play or enjoyment.
Yes, but that's not very helpful to people who are just becoming aware of such technology. There is some room between explaining things in elementary terms like a tutorial and being cryptic to the point of hostility (see: lots of Unix man pages).
The #1 question I find myself asking on tech-related landing pages is 'why might I be interested in learning about this?' because people so often write documentation aimed purely at their own level of knowledge, to the point that they sometimes forget that it might be of interest to anyone else, or even to mention What It's Good For.
It's not a whole lot of extra effort to say 'FPGAs get us closer to the performance of dedicated hardware while maintaining much of the flexibility of software. If you have ever enjoyed building electronic circuits then you may love programming FPGAs.'
It's kind of like expecting the Ubuntu homepage to explain what Linux is
And why would it be such a bad thing for it to say something like 'Linux is a wonderful operating system for your that anyone can use or modify for their own purposes, and it's free!' In fact, it does say something along those lines:
Ubuntu is an open source software platform that runs from the cloud, to the smartphone, to all your things.
Is there something bad about being informative and accessible?
I'm curious as to how AWS plans to prevent users from generating malicious FPGA bitcode that physically damages the FPGA itself and/or the host machine over PCI Express. The possibility of instantiating arbitrary logic gates in the cloud seems very dangerous.
Great question! I checked in with the team and this is what they told me:
"The developer FPGA code is enclaved inside AWS FPGA Shell, to prevent malicious FPGA code from damaging the hardware and to provide the necessary protection for PCI Express and the host machine. The pin assignment of the FPGA is controlled by AWS."
And
"AWS infrastructure monitors the thermals as well and the F1 hardware was designed to sustain high power consumption to enable developers to utilize the maximal available FPGA resources and frequency."
I spent a few minutes poking around their project scripts, and it looks like that the design file you upload back to EC2 is a post-routed design checkpoint, which is an interesting choice... I'll bet that they are doing partial reconfiguration with the I/O ring as the "static" layer.
I have not read the whole FPGA HDK guide yet, but, I doubt amazon will give the designer access to pins on FPGA; they would give access to interfaces or IO blocks which connect to the pins
your design sits inside a wrapper with access to IO blocks.
inside the FPGA logic, you cannot do electrical damage, no matter how hard you try.
inside the FPGA logic, you cannot do electrical damage, no matter how hard you try.
I'd be a little worried about people making ring oscillators and driving them from the fastest possible output from a DCM. That used to be a hazard, not sure if it still would be at the UltraScale level of play.
Or creating a configurable DCM or PLL and then setting it up with illegal values at runtime, when the synthesis tool won't notice...
What's so bad about high-Z inputs? Designing around a floating input is the same as designing around an undefined variable. Bad design but you can't "hurt" the hardware.
An undefined variable is an abstract construct. You can model a floating input in as being analogous to an undefined variable and the model will fit reality most of the time, but the underlying reality is still governed by the complexities of physics, with potential for the model to break down.
Damage is probably more of a risk for an external I/O, but it's still a possibility for an internal I/O. Probably more of a certainty with a malicious programmer.
Who's going to be the first person to design an on-die switched capacitor voltage multiplier, using parasitic capacitance between gates as the energy storage, and use it drive an internal high impedance input to a damaging level?
which has a much-smaller FPGA on it. However the advantage of this board is that the entirety of the toolchain you can use to program it, is open source.
This board is a little more powerful (but still way less powerful than the AWS FPGA), and will use the same basic toolset from Xilinx (EDIT: see comment and recommendation for a better board, below from aseipp):
The LISPM FPGA implementation, may work on this board, but I have not bought one yet to test it.
----
Looking at it more, I think I was sure that it was a $2K part because I thought the AWS instance price was under $2 per hour - so there was no way they could make their money back!
Now that I see the price is about $14 per hour, it makes sense that (making a guess they won't always be utilized) they could in fact recoup their costs after about 18 months or 2 years.
FWIW, that Spartan-6 Mimas v2 chip will not use the same toolset as the one on AWS. Spartan-6 FPGAs are older and require the "Xilinx ISE" EDA tools, which are the older, no-longer-developed tools they created. All modern Xilinx FPGAs are the "-7 series" FPGAs and they use "Xilinx Vivado" for development. This includes the combination ARM + FPGA devices, the 'Zynq series'.
This sounds like nitpicking but it's a very important distinction to make, since ISE is no longer maintained and much worse than Vivado (though I never used it much). Plus, although it's not the dominant concern for many people and needs -- 7-series boards are, far and away, much more powerful than the Spartan series. If you wanted to do something like use Linux on an FPGA softcore, that's where you want to be (granted, the Mimas specifically can handle embedded Linux, like the J2 core!)
A much better Xilinx board IMO is the Digilent 'Arty', which is an Artix-7 with 4x the amount of SDRAM, 3x the logic and many more peripherals like ethernet. This thing is powerful and will be good for a lot of tasks, and the Vivado license is completely free:
You mention the Zynq so I'd like to add that I'd add that Digilent seems to have recently released their Zynq-based 'Arty Z7' boards which provide a dual core ARM on the same chip as the FPGA and is also supported by the free Vivado design tools.
I forgot they released the Z7. The main reason I recommended the ordinary Arty is because I have one sitting here :) It's also a bit cheaper, it seems... Digilent will apparently be releasing a Spartan-7 Arty variant soon, too. (No clue what the new Spartans will have to offer...)
TBH I'm not sure whether new people should really jump into combo SoCs, since there's a lot more to manage in some sense. OTOH, having an ARM processor at hand makes some interfacing tasks actually practical!
As far as combo SoCs go, this one IMO seems to be the best deal, especially as the $200 'Black' board gives you a free SDSoC license and a lot of connectivity: https://www.crowdsupply.com/krtkl/snickerdoodle No ethernet, but a breakout is coming...
That board definitely looks like the better deal; especially since it has on-board 10/100 ethernet, making it much easier to communicate with.
Would a small design (or perhaps a design meant to be parallel) be able to be prototyped on the Arty and then re-compiled using the same toolchain for the much-larger FPGAs on the AWS platform?
It is really, really, really going to depend on the design in question. But, in short: probably not.
If you are creating some design that is essentially agnostic of the underlying peripherials, just generic, reusable RTL -- for example, perhaps you will write a CPU core, like picorv32, or softusb-navre -- then the same RTL will, in fact, work just fine, just about anywhere. Provided you don't use some vendor/device specific code, naturally.
But that isn't normally what you do. A CPU core by itself is worthless unless it's driving some on-board component.
The problem is when any actual peripherals are involved. Which is to say, "every realistic design you'd need to put on the F1 instance." Doing things like talking to an ethernet controller or PCIe bus, in practice, is going to vary immensely based on the board, controllers, and design you're implementing, in practice. Furthermore, boards are going to have their own quirks and errata, etc. A lot of the work is spent just on dealing with the hardware.
For example, the Arty does not have DDR4, but DDR2, much less ECC RAM. Probably need a different SDRAM controller. No PCIe, so you don't really have an option for a PCIe bridge with the Arty, so it's impossible to even experiment here.
There are other things. As an example, FPGAs come with tiny pieces of RAM called "BlockRAMs", which are available to the logic, spread out over the device, independent of any other memory. But, the FPGA in the F1 has an even better component -- UltraRAMs, with more space and "input" ports -- that aren't available on older devices. So you cannot use or test the impact of these components easily. In practice, this is all incredibly important as it will impact the power usage and timing of your design. (BRAMs are a very important resource to utilize effectively.)
This isn't to say it's all a waste of time, and you do it over and over. The cost of a port to a new board will definitely be amortized, by having support for an existing board in place, and having a working, tested design. But the F1 is a high-class piece of hardware that would be extremely hard to replicate in practice for anyone working independently. And you have to spend a lot of time with the hardware. So it's going to be difficult to actually make sure everything works coherently without spending a boatload of money.
The only device in a similar class I can think of, which is also the most affordable UltraScale+ device I know of that doesn't require $4000 for Vivado, is the new UltraScale+ Zynq SoC available from Avnet. For "only" $700 USD for a SoC + carrier card: http://microzed.org/product/ultrazed-EG
An ARM+FPGA device is still a much different beast. But it's a lot closer to what the F1 is, in some senses.
Huh, surprised me that this got attention up rather than the blog post. You should check that out[1] since that is the more appropriate place for general discussion.
Here's a link[2] to previous discussion from when it was first announced.
I'd love to hear what people are using the f1 instances for. I would have guessed that most systems with the scale to make FPGA development economical would also make operating a datacenter economical and thus not be running on AWS. (But I don't know very much about FPGAs, and Amazon biz devs are no dummies, so I'm sure there are plenty of use cases.)
Doesn't Chisel (in theory) support all FPGA tools out of the box? Chisel compiles to Verilog[1], so all you would need to do is import the resulting Verilog into the Xilinx toolchain, then test and synthesize it.
[1]: The full process is Chisel -> Firrtl -> Verilog, which is analogous to C++ -> LLVM IR -> ASM.
Yeah, Chisel shouldn't have a problem with that part. I use Clash (the Haskell equivalent, but not a DSL) quite a lot with a variety of FPGA tools and it tends to work pretty well.
The real task, of course, is binding up all those IP interfaces into nice type-safe Chisel interfaces for users... That's always a huge pain, especially for a device of this class -- where it's going to be PCIe and ethernet interfaces you want to use...
I've only worked with Verilog but Chisel and Clash are fascinating to me. A lot of people seem to use Chisel, including the people related to RISC-V development.
Refer to jeffbarr's answer:
Great question! I checked in with the team and this is what they told me:
"The developer FPGA code is enclaved inside AWS FPGA Shell, to prevent malicious FPGA code from damaging the hardware and to provide the necessary protection for PCI Express and the host machine. The pin assignment of the FPGA is controlled by AWS."
And
"AWS infrastructure monitors the thermals as well and the F1 hardware was designed to sustain high power consumption to enable developers to utilize the maximal available FPGA resources and frequency."
The FPGA pins are connected to the host CPU via PCIe Gen3, 4 local DDR4 channels for each FPGA, and if you are using the f1.16xlarge, there are pins connecting between the FPGA.
Both f1.2xlarge and f1.16xlarge have NVMe SSD, attached as PCIe device to the host, and not connected directly to the FPGA. One could consider using standard linux NVMe drivers or SPDK user space drivers for high throughput and low latency data movement between the NVMe SSD and the FPGA
For some types of processing, it's vastly more efficient to implement specially-designed digital circuits to do the work instead of using a regular CPU. If the need is high-volume enough, these can be fabricated on custom silicon chips. Common examples are DSPs, GPUs, and custom Bitcoin mining chips.
For low-volume applications where it's not cost efficient to fabricate custom chips, there's a specialized type of "generic" chip known as a field-programmable gate array or FPGA. These FPGAs contain a grid of digital logic gates (to oversimplify a bit) and programmable interconnections that allow them to be configured to create any type of digital circuit. While it can't run as fast as a fully custom silicon chip, it's still fast enough to get a tremendous speedup where a custom digital circuit design is beneficial.
Now that Amazon has made EC2 instances with FPGA accelerator cards available, ordinary users with the need for these custom digital circuits now have access to these specialized devices without having to make the enormous upfront cash investment of purchasing and operating servers with these FPGA accelerator cards themselves.
Addendum: It's also worth noting that, as Moore's law slowly grinds to a halt, that building custom digital circuits using FPGAs for specific processing needs is one of the few promising ways remaining for getting a big boost in compute-intensive app performance in the future. By making these available to a wide audience, Amazon is effectively accelerating the speed at which this type of technology will make it to ordinary desktop computing.
The primary competitor for FPGAs at the moment is not so much CPUs as GPUs. Will be interesting to see what kind of systems will end up using FPGAs in datacenters.
Low-latency inference using neural networks could be one. Especially if the practice of "quantizing" the networks (using <32bit integers) as Google does with their TF chips take off.
I highly doubt there is a way to explain this as if you're five. But the very short version of it is that if you need a lot of very large FPGAs for a short while or right now then Amazon will now rent them (up to 8 in one box) to you in an environment that you hopefully are already familiar with. There are some (obvious) limitations, you're not going to be accessing the i/o pins directly but you'll be able to access all of the internal logic and a bunch of memory to create custom co-processors and other nifty stuff without breaking the bank. I can't find a price, but it won't be cheap and if you're just going to toy around with FPGA's you're better of buying one of the small kits.
Even though you program an FPGA it is not the same as programming a processor, the 'software' tells the FPGA what to be, not what to run. Huge difference, and as a programmer you're initially going to be about as comfortable as a fish on dry land.
I have limited experience on FPGA programming, but I am thinking if you could translate a Tensorflow model (or part of it) using XLA TO FPGA bytecode and get a serious speed up compared to using GPUs.
You might be able to create something along the lines of Google's TPU in a large enough FPGA (with a large enough memory bank attached), but it would cost a small fortune to run, likely not enough benefit to rent a GPU instance instead. But it would be very interesting to see how far that could be pushed and what side models could be run on it.
GPUs are hard to beat on price, and the only reason Google made the TPU in the first place is because it is an ASIC, which has a much better price/performance ratio (once you make enough of them) than either a GPU or an FPGA, at the cost of not being able to change the design easily.
(retracted, can't find exact part on digikey) Quantity 1 pricing is about $2000/chip ; however, that price would no doubt drop substantially at even a 1000 chip order. A typical discount (appears to be) 20% at qty 100; so at qty 1000, maybe 33%? So 1 of these chips might be $1200 after you figure a bunch of other discounts?
EDIT:
https://www.xilinx.com/products/silicon-devices/fpga/virtex-...
^^^ the product table.
The XCVU9P is listed as having ~6800 DSP slices, 2,2586K (= ~2.5 million) of system logic cells. That pretty much matches what Amazon describes as them using.
(The DDR-4 RAM is external - so it doesn't help narrow down the device.)
However there are multiple, very similarly named parts, and it doesn't seem that the XCVU9P is listed on Digikey. Other parts named UltraScale are - thus the confusion.