Just to check my understanding -- there's a 10x10 mm active interposer here, which has 64 "management-class" (MMU-less, I assume) cores, a crapload of interconnect, and a decent amount of I/O; and this can support up to a 5x5 array of chiplets bonded on top, each of which (of the current three) is either a quad core "application-class" processor (but still in-order), a tiny little FPGA (big question seems to be the width/capability of the DSP blocks), or something AI-ish that doesn't really interest me. Yes?
One question that immediately comes to mind is if there are a few pins on the interposer that are basically pass-through to the 2 x 2 mm chiplet on top, so that a chiplet that provides some level of I/O beyond what the interposer already has can be supported. I assume the UCIe links or something can be used as generic high speed SERDESes for that sort of use case, but more thinking about radar FEs and such.
In similar manner: "what about silicon photonics?"
IIUC fiber transceivers have become better than baseband copper differential pair transceivers for same-rack cluster interconnects.
And the main issue with them is packaging difficulty/needing to have a cheap link to the switching fabric data plane, if these benefits are to remain strong after accounting for the overhead that result from not being able to integrate the PHY on the same die as the fabric data plane.
Both efficiency and cost.
Yes, you are right that pseudo-integrated silicon-photonics is really exciting! The Ayar Labs + Intel FPGA chiplet combo based on AIB is a great example. Packaging is one of the really big challenges, DARPA invested ~$100M to solve this problem with the PIPES program.
It's a 4x4 array of chiplets, the rest is UCIe/IO logic. the ebricks do have some pass throughs, but real IO capability is done through UCIe based chiplets.
Thanks. I'm trying to think of cases where (assuming no use of the AI stuff) this actually ends up cheaper/smaller than just embedding 16x a tile's compute cores (or a smaller number of much higher performance cores) and 16x a tile's FPGA fabric... with such a small number of options and small number of tiles, I'd suspect (but could easily be wrong) that the area overhead for bonding is on the same order of magnitude as the actual logic here. And while having a large number of fast cores scattered through FPGA fabric would be /relatively/ novel, it seems like it would support the type of flexible configurability that's needed here. I could imagine that this grows in power with a growing ecosystem of tiles, but I'm stretching my mind here for tiles that actually need to be ASICs instead of just implemented in fabric.
Yeah, I have been struggling with these questions for 25 years and I am not alone.:-) You can always spend $100M and build exactly what you need (eg. 16 RISC-V processors with custom vector extensions and a selection of I/), HBM3, serdes, ...). If you are lucky your with your application fit (cost, power, performance) you could possibly buy a Snapdragon, Xavier, Versal, Agilex. The problem we are addressing is the gap, where you don't have $100M in your pocket and you are constrained by physics (size, weight power) or cost.
This is probably not the place to write a full post-mortem, I will save that for a blog post or something, but here are a few points:
* 2008 was a really bad time to launch a chip startup
* Epiphany predated RISC-V...nobody wants a new ISA
* With <$7M raised in 9 years it was starved, there was no way to turn architeture into a good market fit (this would have required $50-$100M in tapeouts, CPU licensing, IO licensing, sw,sw,sw,,,). This is the problem we are trying to address now.
Epiphany did find some customer end applications (machine vision for example) and the Parallella board shipped in volume. The volumes and number of customers just wasn't big enough. Everybody wants to be the next ARM, Nvidia, but there is only room for 1-2 on earth.
I guess a few people got the board to do something useful, but the vast majority of Parallella boards are sitting in a box in the bottom of a closet somewhere.
All the machine vision sensor companies use FPGA, which burns huge amount of power. Like 5w when it could be milliwatts.
I hope these guys have some ASIC modules for interfacing with CMOS global shutter sensors. Damn if they did I might start a machine vision cam company myself
The challenge with today's FPGAs is that you rarely get exactly what you need. The smalle ones aren't feature complete and the big ones are expensive and power hungry. Lots of questions though, which interface parts are you talking about? The electrical interface (lvds, mipi, cmos, etc), width, resolution, vendor type, or some kind of custom ISP like you would have in a high end camera?
Do you have any good explanation why should it be Milliwatts? I am having here an older ZynQ project with 2 ARM cores running normal operating system and image processing part written in VHDL. 7 Watts. But it is normal computer with normal operating system. It’s on par with typical Raspberry Pi consumption.
Not the most interesting question, but: "log in with LinkedIn" as the only option? Really?
I get that you want to generate leads, but giving the least trustworthy social network as there only login option to save 2 seconds pasting a name into the LinkedIn search box isn't a good idea. Many of us are forced to use LinkedIn for career reasons, but try to keep it isolated from everything else, because we remember when it scanned people's contacts without permission and other dodgy behaviour.
Thanks for the feedback! Since our business model is b2b and we are giving away emulation time on some very expensive cloud FPGAs, we figure the automated linkedin auth was fair. With Parallella we had 10K mostly anonymous customers and for a startup that was not great...
LinkedIn is a great choice for a login. I had a b2b site, and at first we offered Google, LinkedIn, etc.
Almost everyone used LinkedIn, and of course the profiles were so much more helpful, so we simplified our process and used LinkedIn only. We never had any issue with that, its a great choice
Not really. Yeah, none of the options are great. I'd prefer a nominal payment TBH, but many people would probably not be keen on that (as expensing it could be an issue)
I didn't realise you were giving time away on FPGAs. Probably the best option would be to have a lighter preview that just shows the UX but not simulation, for anonymous users.
Unless you really expect your business to be mostly high-touch sales, giving your potential customers as much information as possible before you demand anything from them is probably the best strategy
One option I've seen work for a similar requirement is email verification on any corporate address. All you want to know is that they're legitimately connected to a legitimate prospective customer, just make them answer an e-mail on the company's domain.
I guess the interesting question is how this approach compares to FPGAs in terms of cost and performance. Given the functionality of the different chiplets as soft IP, wiring them together and loading the design into an FPGA should be easy enough. So how much more would I have to spend on FPGAs to implement the same functionality as a typical combination of chiplets? How much more to match the performance? Or are there no FPGAs that could fit the functionality or match the performance?
Depending on who you ask, the silicon area overhead of programmable logic FPGA is 25X-100X compared to hard coded logic of an ASIC (at the same node). Very few applications can afford the cost of full platform programmability with large devices costing over $5K.
Does this transalte to if I have a design that I can fit into a $10k FPGA and that can also be implemented with Zero ASIC's chiplets, then I will be able to get a chip in the $100 to $400 range?
No my bad for not being clear. If all you want is a bunch of LUTs then the high end fpga is the right answer. FPGA today is a lot of other things besides PL (serdes, cpu, DSP, ML, DDR IF...) and some of those features are only available on the bigger FPGAs. FPGA portfolios have man gaps due to the enormous costs of designing each one of those FPGA products.
A $10k FPGA at Digikey price or a $10k FPGA at the direct from [vendor] purchase price? The former is a medium sized chip, and the latter is a huge one. If you have a volume product, you actually get reasonable pricing from the FPGA vendors.
I would guess if you are considering a semi-ASIC solution like Zero ASIC, then you probably want to build more than a handful of devices, so getting the FPGAs from the vendor would probably be the better comparison. What kind of price difference are we talking about here? The most expensive FPGA I could quickly find on DigiKey is a AMD Virtex UltraScale+ VU57P for $174k in quantity one, what could I realistically expect to pay if I ever needed a thousand of them? Half the price? One tenth?
The price difference for direct, volume purchases of FPGAs is reportedly about 10-50x lower than the unit price on Digikey, but they keep it very secret so they can price gouge you on negotiations. That VU57P likely costs at most $10-15k when purchased directly and in volume.
Right now y'all look focused on digital logic somewhere between ASICs and FPGAs.
Any plans for custom chiplets? Custom analog layout might be much cheaper if done MPW or Tiny Tapeout style: design a mere ~100x100um area, then bond it to standardized chiplets for control/power.
Thanks! Yes, we are focusing on big digital because that's where the big cost problem is. Also, the kind of block based design we are proposing requires a standardized interface. Th standard interface för analog is "a wire".:-). What kind of analog did you have in mind?
What's the plan for the protocol on top of UCIe? Any support for CXL? I see you also specifically mention Chisel, so I suppose it's not a total shot in the dark to ask if you're supporting TileLink (ideally TL-C) over UCIe?
Are there chiplets from other sources? The announcement seems to indicate there is a catalog. But I imagine there are huge barriers to making chiplets from another source / made using a different process / different size and shape and contacts work (maybe? I don't really know)
And another question: The quad core RISC-V chiplet, is that 64 bit? RV64GC? Any other extensions like V, H, ...?
The swappable nature of the chiplet bricks was meant to convey that they are the equivalent of toy plastic blocks. We didn't disclose the standard yet, this will be coming soon.
My takeaway is that part of the concern is around functional units getting integrated into a SoC that are not 'safe' but might be backdoored or untrustworthy.
Having small IP cores where the source can be verified/audited that can then be snapped together like Lego blocks are very appealing from both a cost savings and supply chain safety standpoint.
What does the software toolchain behind this look like? In particular how are people expected to program the resulting hybrid/heterogenous architecture.
Each chiplet comes with its own tool chain. The FPGA has a full RTL2BITS tool chain, the CPU is supported by standard RISC-V tools, and the ML chiplet will have a typical ML programming flow. The full system parallel heterogeneous programming challenge is something the whole industry is grappling with. It's an unsolved problem. DARPA started a program on this topic in 2020 called PAPPA.
The manycore emulation demo kind of hints at where we are going...more information to follow in the next few months.
Structured ASIC are generally fixed size monolithic chips with metal/via programmablity to control some amount of wiring (hardcopy, easic,..). Modern FPGAs with a mix of har coded blocks (serde, cpus, DSP) connected by NoCs and PL is another example. The front end transistor layers are fixed.
The Zero ASIC platform does "late binding" by wiring together different chiplets (cpu, lm, fpga, serdes,...) in the package. Both approaches are addressing the same problem of flexibility vs performance, but the approaches are very different.
Intel decided not to buy them after-all. And they're on... uh... round F? And they no longer have a SaaS multiple. They're still cranking out the designs, but their latest ML focused design doesn't (I think) support the new narrow precision data format. From the outside it looks like they're a bit distracted.
I thought custom ASIC's were done when you have some computation that takes many instructions to implement in a CPU, for example for Bitcoin mining your computation might be "Find the sha256 of a fixed-length input and tell me if it begins with at least 32 zeros." The main benefit of using an ASIC (as opposed to an FPGA) is you can hard-wire the circuits, the FPGA has to use generic hardware (super-flexible LUT's and a super-flexible routing fabric) and you pay for all that flexibility with speed, power consumption and area.
I guess the main benefit of the chiplet idea is that you can get a really fast, low-latency connection between the CPU and FPGA? What kinds of problems can be solved with that architecture, that can't be solved by just putting the CPU and FPGA as separate chips on the same circuit board?
Or is it cost and design simplicity, you can have a lot of chiplet elements on the same chip, and the customization is in which chiplets you want ("give me a chip with 4xFPGA and 6xCPU") and how they're connected ("CPU A should be connected to FPGA #4, CPU B should be connected to FPGA #2 and #3")?
ASICs are very diverse with many dimensions and it depends on the definition. Pure ASICs like bitcoin miners are very rare. Most "ASICs" are generally on the programmable platform spectrum (apple socs, snapdragon, versal, ML startup chips, Nvidia, amd versal,...). Take a look at the Apple SoC or Versal Soc of heterogeneous archs with a bunch of different programmable elements integrated.
Also...there is way too much emphasis on compute, most applications are IO/ memory bound. How many memory/serdes channels, total I/O BW, on chip cache sises, what type of I/O is much more important than peak theoretical flops/w.
Curious to know how well this supports hybrid designs. No-code “IP block” composition is great for some stuff but there’s going to be customisations that aren’t on the standard “blocks”. That said, cool to see this emerge.
No-code IP block composition using standard interfaces has been a feature of most RTL software suites for almost a decade now. There's a reason it isn't used more widely: once your design gets complicated it gets very unwieldy.
That said, that kind of flow has people who love it.
Code(RTL, Chisel, C) that can be bundled as a memory mapped IP and turned into a brick integrates really well. The problem is that this reqyuires custom tooling. A full mask set taoeout is going to cost $5M-$10M and take 6 months to complete at a reasonably advanced node.
No relation. Zero ASIC predates Matt Venn's "Zero To ASIC" course. We made some announcments on Twitter and LI in 2020 to announce company formation, but Matt didn't see them. I agree that the name collision is unfortunate.
It looks like there's something interesting here, but I'm not really seeing what it is.
Can someone explain in simple terms what's the added value, compared to buying an FPGA with an embedded RISC-V core? Like, if you want maximum configurability, you already have FPGAs, and if you want maximum performance, you have ICs. Where does Zero ASIC come in? How does it push the Pareto frontier? Is it for prototyping or for final products?
If a catalog low performance $1 mirocontroller is good enough, then it's probably the right answer. A low cost FPGA (~$5) with an embedded CPU is also a great choice.
If none of the off the shelf components can meet the application cost, power, performance, size, weight, security constraints, you have the choice of either abandoning the project or spending $10M-100M to design an ASIC. Our chiplet approach fills the gap between ASICs and off the shelf FPGAs.
Like, say I'm in a startup working on a novel microcontroller, or a novel ASIC, or a chip for a space rocket. Right now I'm prototyping on an FPGA, but performance / power / weight isn't good enough. Where does your product come in?
Like, if I'm not getting enough performance-per-power from an FPGA, how am I going to get enough from your eFabric chiplets? The underlying computation still happens on FPGAs and RISC-Vs, right?
Or is that you can embed your own custom circuits between pre-made chiplets? In that case, how do you save on capital costs (eg making the mask)? Through a fab shuttle?
case #1. If you truly have something custom IP, we could save time/effort by only chipletizing that part rather than working on the while SoC. The design and verification is an exponential function with respect to complexity(#blocks, die size). We can turn RTL into a 2mm x 2mm brick fairly easily.It would require a fab shuttle/mask sets though.
case #2. Depends on huw much power goes to the PL vs other functions. For PL dominated FPGAs that you fill up to the brim, our only value would be to help like in case #1. For multi chip solutions (FPGA + CPU) with small amounts of PL, a corretly designed small catalog of off the shelf chiplet approach wins.
Semiconductors used to be a very hot area for startups, but SaaS is a much easier one because:
a) There is a huge fixed cost to developing a chip
b) a start-up can usually only afford to develop one innovative block. They must then obtain the right to use many off the shelf IP blocks (CPU, interface blocks, radios) to make a complete chip product. The economics of this mean that large semiconductor companies, who have a large portfolio of blocks that they don't need to pay licence fees for, are at a huge advantage
c) As chip products have integrated more and more features on a single die, these factors have become more and more dominant
For these reasons, for some time now aquisition by a big player has been the only realistic 'exit' available to startup founders in the semiconductor area.
In theory, Zero Asic's approach could mitigate all of these, which would lead to a much more active chip startup scene.
Everything RISC-V is good. But we should go always for 64bits even if we "waste" some "logic material", because RISC-V as a worldwide/licence free standard is a very good ground for an assembly realm: write 64bits RISC-V once and run it on embeded/desktop/server/etc. Just need clean tables of functions for platform mobility which can be added slowly step by step.
The main pitfall while doing that, is abusing the preprocessor, because writting "c++" using a preprocessor assembly is hardly less worse than coding c++ then creating a absurdely massive and complex SDK dependency.
One question that immediately comes to mind is if there are a few pins on the interposer that are basically pass-through to the 2 x 2 mm chiplet on top, so that a chiplet that provides some level of I/O beyond what the interposer already has can be supported. I assume the UCIe links or something can be used as generic high speed SERDESes for that sort of use case, but more thinking about radar FEs and such.