Judging from [1], Sam Zeloof's plan might include using electron beam lithography, which scans an electron beam over a wafer surface, instead of normal photolithography. This can get high resolution (10nm) comparable with EUV, and could theoretically be built out of a hacked scanning electron microscope. Photolithography is the step that limits fabrication size, so e-beam litho allows cheap transistors comparable with state-of-the-art.
The main problem is e-beam litho is extremely slow. It might take ~1 day to do a single photolithography step for a 1x1cm chip, whereas an EUV machine can pattern a 300mm diameter silicon wafer in < 1 minute. (The next problem is making everything reliable. Billions of transistors (a modern CPU) needs a failure rate per transistor of better than 1e-9.)
Maybe that's enough for extremely-low-volume production?
Doesn’t necessarily make it a bad idea to try again. It seems that photolithography as a process is stretched to its limits. Easy to see the diminishing returns. The electron beam lithography is a different kind of process. It seems like it’s precise enough, just slow. That sounds like good news to me. “Make it go faster” is something we typically can achieve.
e-beam litho is anything but fast though: the machines are cheaper but they are much much slower in wafer throughput than the insanely expensive ASML EUV machines.
Now the website claims a fast fab, but leaves it open what that means: fast production of wafers? Or slow production of wafers that run fast?
If the machines could be super cheap you could make up for slow by having many run in parallel (~~not parallel beams working on same chip, since electrons deflect each other, but machines running in parallel~~).
Edit: linked below, https://www.ims.co.at/en/products/ , says it uses 512x512 beams with a beam field of only 82um. Is that spacing between beams, or width of all the beams together?
The machines themselves couldn't be "super" cheap, that's impossible. You still have to deposit the e-beam resist while keeping the wafers extremely clean. This is non-trivial.
The only route to economic viability is absolutely massive beam parallelism inside the tool. But at that scale, there's serious questions about accuracy/reliability. Just one out of hundreds of thousands (or millions) of beams fails for a microsecond and the chip is ruined. This is a problem that is effectively sidestepped for traditional litho -- the masks themselves are created by (slow) e-beam, but mask inspection tools ensure that the masks are perfect before they are actually used to process product wafers.
> You still have to deposit the e-beam resist while keeping the wafers extremely clean. This is non-trivial.
True, but this is more or less the same process for e-beam and photolithography (as I understand it). I don’t see a fundamental reason why one couldn’t replace one ASML EUV machine with, say, 1000 e-beam machines and run them all in parallel. You would need the e-beam machines to be extremely reliable, but they’re conceptually simple devices and this should be possible.
(With vague ballpark numbers from the Internet, an EUV machine appears to be about 10k times as expensive as a SEM. Building 10k e-beam machines at the same cost as one Alibaba SEM would be an interesting challenge, and there would be factors pushing the price in both directions.)
> The machines themselves couldn't be "super" cheap, that's impossible.
There are a few dimensions of cost that can be optimized though, right? My understanding is that ASML is making ~10s of these EUV machines per year because of the extreme complexity of many components.
Sure. Chief among those dimensions is the fact that it's not used as a serious production technology, so the manufacturing of these systems doesn't benefit from economy of scale.
E-beam certainly does provide a bounding limit on how expensive EUV can get, but we're not in danger of hitting that limit anytime soon.
I expect that EUV will become cheaper/more productive per dollar in the medium term, unless ASML starts acting uncomfortably monopolistically (and it's probably in their interest to drive EUV adoption to starve out Nikon and Canon, anyway)
I don't think being an ML chip means the defects are necessarily less fatal. These often interfere with the actual functioning of the chip, cause shorts, etc -- it's not just a matter of the TTL being very slightly messed up somewhere.
You could imagine chips that are engineered for redundancy / defect resistance, but that would make them a lot less performant so it's highly questionable whether that can be justified by any cost savings on litho.
My short introduction into the fab industry exactly echos this. Allowing US companies to turnaround prototypes quickly is a valuable business. Perhaps they aim to get a foothold with this and slowly ramp up to high volume manufacturing.
e-beam litho just seems strange to me since electron beam 3D printing is just as fast as using lasers, so clearly the scanning part isn't the bottleneck. What is the bottleneck in this case?
You need to deposit a specific amount of energy into your resin to polymerize it, that takes time as you can't just crank up the amount of charge or the energy / electron in your e-beam as that will usually increase the energy variance and thereby the aberration. It's much easier to produce a coherent beam of light with sufficient energy than a coherent beam of electrons with comparable energy.
> since electron beam 3D printing is just as fast as using lasers
Interesting. What sort of resolution is that 3D printing though?
> What is the bottleneck in this case?
My guess would be using a single beam? Perhaps it's possible to scale this up to multiple beams working on a die or wafer at a time time?
Which brings up another interesting question. Would this process require the same kind of wafer/substrate as traditional EUV machines? Perhaps using this approach opens up the possibility of using different materials that are easier, cheaper and faster to produce?
Dont't traditional kinds of wafers have to be grown and sliced from exotic/rare materials? If so the additional time to "etch" with this new process might be offset by other factors such as what goes in to preparing the wafer?
> Interesting. What sort of resolution is that 3D printing though?
Around 50 microns I believe. Not at lithography resolutions obviously, but that's limited by metal powder grain size.
> My guess would be using a single beam?
Electron beams can scan a whole print bed very quickly to heat up the whole top layer [1] which can't be done using lasers. This can be done easily with electrons since they are deflected using magnetic coils, like good old CRT monitors, but this can't be done using lasers because they have to move the mirrors mechanically.
That's why it seemed weird that photolithography would be so much faster, but maybe it's as you say, lasers can be stacked for parallelism to make up for those downsides. Stacked electron beams might interfere with each other because you can't really isolate magnetic fields.
> Crazy they have physical masks with features as small as 7nm.
Everything about this is crazy complex, and the state of the art in any given year is also secret to TSMC and other tiny-feature-size fabs.
But in addition to gradually upping the narrow-bandwidth/phase-coherent illumination frequency every year (which has many problems but continues to see continual progress), they've also long been using techniques to work around the diffraction limit/resolution barrier [1], such as subwavelength metamaterial "hyperlenses" / "superlenses" (previously widely thought to be impossible even in theory) [2][3] and "assist features" and other non-traditional masking elements to pre-compensate for imaging distortions [4]. Plus they fiddle a lot with the chip process to tune it in weird ways to assist with or compensate for the previous issues.
That still seems pretty delicate. The lenses and mirrors would have to be aberration-free to an extreme degree so as to not introduce too many artifacts, and moving the mask further from the surface would increase risk of diffraction artifacts, no?
It is, but that's how modern semiconductor lithography is done. Massive lenses and mirrors (both made of different materials that normal, since they need to have optical properties at wavelengths much smaller than human vision) are manufactured at great cost to ensure that it is free of aberrations. The extremely limited supply of these is actually one of the many factors that restricts the ability to move to newer processes and scale production capability of newer lithographies.
For good telescope optics, we look for something like 1/4 - 1/6 wavelength tolerance, minimum. That's for optical wavelengths, but photolithography is in the UV range, so that's already stricter tolerance in absolute terms because of the shorter wavelength, but how does the tolerance in relative terms compare? Thanks for the info!
One youtube video[1] I watched had someone state that if you scaled up one of the curved mirrors to the size of the Earth, the largest imperfection would be the width of a hair.
> That's why it seemed weird that photolithography would be so much faster, but maybe it's as you say, lasers can be stacked for parallelism to make up for those downsides.
The reality of how this is done is so much more complex than I would have thought: https://www.youtube.com/watch?v=f0gMdGrVteI Traditional techniques such as masks don't work when dealing with xrays.
No the scanning is the bottleneck, scanning laser photolithography is equally slow. For mass production of chips, photolithography is done with a light source that illuminates a "large" area all at once.
There are some theoretical approaches that you could use to make e-beam a lot faster, and I'm not sure anyone has really explored them due to the unreasonable effectiveness of photolithography. Basically, SEMs and e-beam machines today use a low- or medium-power electron beam that they treat as a static beam, and scan slowly to keep the "static" assumption. If you instead think if it as a traveling particle stream, you may be able to "pipeline" the process of steering the beam as it travels down the microscope, allowing you to crank up the power and run the process a lot more quickly. It would be very cool to see a startup pursue super-fast e-beam and make it work, and it's a niche I'm excited to see explored.
A common approach is to use multiple electron beams in parallel ([1] is up to 262144 beams!). This is starting to be used commercially to create the masks for photolithography.
AFAIK, the best performance IMS were able to achieve with the 512x512x50nm e-beams was 1cm^2 per hour. That's acceptable for etching masks, but still not feasible for chip manufacturing, as wafer goes through 40-50 exposures each would be taking days to complete.
Personally, I think there is an undertapped market for extremely low volume productions. The cost of a mask set is in the hundreds of thousands of dollars, and there is very little custom or domain-specific IC design.
I do think it's a slow market to emerge. They'd need very patient funding. If nothing else, tooling needs to catch up, which is 5+ years.
apart from massively-parallel beam systems as discussed elsewhere here, it seems more likely to me that e-beams could be used for mask-making, which might make it easier for smaller clients to make the jump to modern processes.
like if you can do a 7nm or 14nm tier mask maybe that becomes a pivot to a 28nm actual production process, or maybe it makes multipatterning and some of the other advanced-node tricks more accessible at a semi-reasonable cost.
I wonder if e-beam lithography would make registration easier. With photolithography, I assume one must position the wafer relative to the mask extremely precisely. With e-beam lithography, the tool is a scanning electron microscope, and as long as the wafer doesn’t move, the software could potentially locate it to essentially arbitrarily high precision and then offset and rotate the scanning pattern accordingly.
ASML surely charges plenty for their alignment hardware:
From the description of a “fast fab” and this, perhaps the play is to beat photolithogaphic fans on turnaround, e.g. GDS-to-first-silicon latency. An e-beam doesn’t need masks made.
I was under the impression that electron beam could fundamentally get much smaller than light (electrons are “smaller” than photons in some funky physics sense), but was just really slow.
Also (fuzzy memories of semiconductor classes, but) the size doesn’t tell the whole story, right? With photons I’m under the impression that the wavelength of the light is much higher than the feature size, so they have to do funky things with the masks to make it all work out. Playing with interference or whatever. (Someone who knows more about this can feel free to embarrass me, I’m sure it will be educational!)
Electrons are relatively speaking more like the nice little billiards ball thwacking away at the SI that we like to imagine.
1 day for (1cm)^2 is 15 minutes for (1mm)^2. That might be okay for a start -- I suppose they do not want to go for the largest chips right from the beginning.
At the N5 node 1mm^2 of a fully finished wafer is worth about 25 cents. This may have undergone hundreds of steps. If we assume the chips we're making can be done in 100 steps, one of these E-Beam Lithography machines costs the same amount as a scanning electron microscope, can run 24/7 for years, and there are zero costs to operate, and we need to break even in 2 years, then we need to charge about 450 times as much as a mass produced chip. Obviously under more realistic conditions, that multiplier would have to be way higher.
While for steady state production, a chip could be produced every couple of hours, no one is going to pay tens of thousands per chip for even a limited production run. If you are doing a one off prototype that justifies an extremely high pricetag, you have long lead times waiting for the chip to go through the various steps.
Honestly you'd be better off just making custom masks.
> We believe our team and lab can build anything. We’ve set up 3D printers, a wide array of microscopes, e-beam writers, general fabrication equipment - and whatever is missing, we’ll just invent along the way.
Could someone please explain why they can't just hire equipment from a university lab if all they want to make are small batches using old processes? Aren't there lots of commercial-use programs for university equipment? For example In Bradford, UK, there is(was years when I was there) a whole building full of startups taking advantage of the university in the medical field. For anyone that thinks universities support only cutting edge research let me say one startup I did some work for there was testing anti-acne drugs on cell cultures. Definitely not cutting edge stuff.
On the electronics side there is another org nearby (Electronics Yorkshire) where you can hire really expensive inspection equipment (like 3d xray machines etc) by the hour. This is not their core business, but it is a very useful service for startups in aero-space electronics for example. I can't imagine there are no equivalent organisations and university programs in US.
There are 2 main difference between many university (and even the much larger dedicated research lab/fabs like for example in Albany/NY Creates):
- Research labs are typically equipped to focus on critical parts of the overal production process they want to do research on. But to get actual designs manufactured you need a fully working and sufficiently yielding process from bare wafers to transistors, interconnect, passivation etc. This is easily multiple hundreds of steps to 1000+ different process steps (just think about making a single metal layer: deposit resist, expose, etch, deposit barrier, liner and and then metal, CMP)
This works well for the uses cases you describe: dedicated inspection equipment or very niche small scale nano fabrication needed. But making a a fully working VLSI CMOS process needs much higher complexity and many more different process steps and equipment.
Which brings me to the second point:
- Research labs typically don't care much about yield: they need sufficient amount of working devices to measure and show things are possible and publish about it, but they do not need to bring it up to reliably reproducible results. However when you make an ASIC with millions to billons of transistors and wires, you typically depend on ALL vias and ALL transistors working, not just 80-90% of them, to get a functional chip. This is a lot of hard and dedicated work, in a way it's an art, black magic almost to most: very few people and companies have the expertise to pull this off. It's not a big exaggeration to say only TSMC can do this really well. Just look at how much even Intel is struggling to get their next node out (it's been how many years of Tick Tock Tock Tock Tock.. now?)
That's a good argument /explanation. However, based on the equipment cost they mentioned (half a million $) I assumed they wouldn't be doing VLSI chips. As you said process reliability and efficiency is king there. I very much doubt you can do VLSI chips with alibaba equipment for $0.5million. My impression was they would be making rare chips with few tens of transistors tops (perhaps analog ics etc.). Even with few dozen transistors there should be many chips they could sell. Simple(obsolete) logic no one makes, but required to service old military equipment. Other simple IC's. If they could pull off a process to produce few K of transistors efficiently (an equivalent of 1980s tech) they could make microcontrollers etc(flash is another story). This is coming into VLSI territory, but it is still extremely far from a powerfull TPU chip. Look at google's edge TPU. I bet they sell them so cheap only because they had some unused wafer space left near the edges when making their proper TPUs. So they squeezed in some of those edge TPUs almost for free. How can anyone with alibaba equipment compete with that?
Sam Zeloof did a 1000 transistor chip (granted, not interconnected) with eBay Alibaba type stuff. VLSI would be probably 10,000 transistors or higher, so within the realm of possibility.
On the thousand steps bit, LTT did a tour of one of Intel's fabs in Israel and it's pretty incredible. Everything is automated with little bins moved via overhead rails to the different stations.
I do research in a university lab that is also used by a number of companies. People make plenty of transistors, including state-of-the-art research.
However, reasonably sized processors need millions of transistors, and (a) we can't easily make that many at competitive feature sizes, (b) it takes significant time and effort to set up and debug a process, and even more to get high yield. So while it's theoretically possible to make small processors, it's much easier (and, including labor, probably cheaper) to leave that to dedicated fabs. Small prototyping runs (via [1] or similar) are common.
Instead, people use the lab to, e.g., prototype new MEMS devices or test new types of transistors or memory cells. Once the technology is proven, it can be mass-produced elsewhere.
I can't fully answer this question bc I'm not in the sc fab field but afaik for most medical research you're not needing to get down the nano-scale yourself. Unless you're designing/building custom chips for imaging or bci's, but even then I imagine any in-house prototyping is not at the final scale.
In the US there's a consortium of fabrication centers[^1], but even if a uni has a clean room, I think a lot of custom designs often go through China because it's less hassle.
Do you have any further information on how/if there are "practical" ways to get custom maskless VLSI there?
Due to the practical limitations on "direct" reticle size for maskless lithography, you already need reticle stitching for matching any modern (think, this century) mask-based photolithography VLSI capabilities, and thankfully most maskless reticle stitching tactics can be scaled to an entire wafer. [0]
Some tactics likely need mechanical re-positioning of the write head to different parts of the wafer due to Etendue limitations of commercially practical optics (keeping sharp focus across the entire optically reachable area, within which they rapidly write individual reticles at reticles-per-second rates IIUC usually somewhere in the audible range), but the same mechanism that's used to track alignment of the lithography layer to those of previous steps, can typically be adapted to work across mechanical scanning within exposure of the same lithography layer.
While wafer-scale integration naturally needs defect-compensation, modern micro-channel liquid/phase-change cooling can already handle heat removal at desktop Zen3 chiplet power densities without needing a heat spreader to thin the power density out.
Tactics like feeding in the liquid parallel to the chip, and letting it boil on the chip-side of the structure, to then let the vapor escape normal to the chip (if it's flat, this would be vertically upwards), can scale to very large areas because you can put occasional liquid feed pipes in the vapor-space that are thick enough to not cause excessive pressure loss, and thus scale from having to pass the liquid sideways across the entire chip to something with less flow resistance (pressure drop) for the escaping vapor.
3D printing can manufacture those intricate structures that allow exceeding areal power density limits of nucleate boiling (which are around 10~30 W/cm² for chemically/environmentally tame hydrocarbons (e.g. Pentane, boiling comfortably at 1bar/2bar/5bar at, respectively, 36°C/58°C/92°C, low toxicity but more flammable than gasoline), and around 100 W/cm² for water (sadly, 120°C surface temperature isn't practical for silicon CPUs)).
Wafer-scale processors are just extremely capable compared to normal reticle-limited ones, see e.g. how Cerebras manages to run fluid dynamics simulations for things like iirc helicopters at/above realtime speeds, enabling predictive control in aerodynamically unstable situations.
Allowing manufacture of processors not limited to special reticle-border-crossing wires is, IMO, quite ground-breaking. Imagine things like hex or triangle grid mesh networks on the chip, and just overall a far more homogenous mesh topology just about flexible enough to route around the defects, possibly using just the normal required back-pressure routing to deal with the congestion-hotspot from needing to divert around a disabled cell (and do so without the greater surroundings needing to even be aware of that cell being disabled). [At worst a disabled area would need to be turned into a rectangle to make adaptive Manhattan routing work. The software would need to be taught to deal with holes in the physical-location-based address space, but many algorithms are inherently tolerant enough to deliver proper useful results even with their data grid having holes/crystal defects, and a physically homogenous grid of cores (should, IMO) fit(s) those better than one less-homogenous that can fully mask deactivated cores (like Cerebras's device).]
And beyond wafer-scale processors, analog VLSI processors (they could be manufactured using traditional photolithography) are awesome.
There are just two major obstacles in the way of utilizing them:
1) they don't allow much flexibility in even simple operating parameter tuning (let alone larger FPGA-like reconfiguration) to use a generic chip in many situations/different devices. Thus they need a low practical MOQ for broad utilization.
2) due to difficulty with simulating/modeling the entire dynamic system they operate/control, substantial iterative experimental tuning of the hard-coded (though literally the shapes of the devices manufactured in the integrated circuit) parameters will be necessary during product development. Mask-less lithography is inherently able to manufacture down to single-digit MOQs (only really limited by stochastic yield/binning and multi-step-process lead times).
[It's trivial to sample the hard-coded tuning parameter space, via simulating behavior differences, and then sampling finely enough to not miss the perfect range of parameter values as a result of manufacturing yield/rejects poking holes into the sampling grid.]
I believe there are substantial opportunities in electronic power converters, due to the frequencies unlocked by recent advances in [high-frequency-capable] SiC and GaN power transistors.
And how multi-MHz switching shrinks the size of capacitors/inductors/transformers, in exchange for demanding extremely rapid control (and worse, often limiting the current of a single module due to speed-of-light effects, which implies many individual controller chips).
For reference, here's a list of commonly-encountered electronic power converters that typically aren't made in the quantities needed to make traditional analog/mixed ASICs economically feasible:
"computer power supply",
"high-efficiency electronic motor controller" (which are just fancy variable power supplies commanded by a controller that translates input commands and possibly sensor feedback into drive voltages),
"solar panel to power grid adapter" (regardless of whether the grid is a normal AC grid needing it to be an inverter, or if it's a DC grid needing it to only adapt the voltage),
"battery charger" (essentially all types, except if the voltage matching is done by an external device like how electric cars with DC fast charging only request the desired voltage from the stationary "charger" (itself "just" a variable power supply)),
etc.
[0]:
[With DLP chips, you get single-exposure pixel counts on the same scale as contemporary TFT (I e., active, large-panel) LCDs (the kind used in flat screen computer monitors and TVs; this stems from a large (if not the largest) market for DLP technology being projectors/beamers), and electron-beam approaches run into issues with deflection mechanism linearity (i.e., pixel spacing uniformity between center and borders) in the 1000~100000 (1k~100k) linear pixels (width, in the fast axis).]
K8 seems pretty small after he made the fucking Zen architecture (and Tesla AI). Lex has good interviews with Jim[1], he's not just very smart, he is wise. He started Tenstorrent some time ago with Ljubisa Bajic so I'm assuming this fab is related. Here is youtube channel for Tenstorrent[2], and last time I checked it was getting under 100 views per video... Ian also interviewed him[3] some time ago.
Long story short if I had money that would be relevant in this context I would invest really hard into Tenstorrent.
Investing time to listen to the guy seems to be not worse.
> IC: A few people consider you 'The Father of Zen', do you think you’d scribe to that position? Or should that go to somebody else?
> JK: Perhaps one of the uncles. There were a lot of really great people on Zen. There was a methodology team that was worldwide, the SoC team was partly in Austin and partly in India, the floating-point cache was done in Colorado, the core execution front end was in Austin, the Arm front end was in Sunnyvale, and we had good technical leaders. I was in daily communication for a while with Suzanne Plummer and Steve Hale, who kind of built the front end of the Zen core, and the Colorado team. It was really good people. Mike Clark's a great architect, so we had a lot of fun, and success. Success has a lot of authors - failure has one. So that was a success. Then some teams stepped up - we moved Excavator to the Boston team, where they took over finishing the design and the physical stuff, Harry Fair and his guys did a great job on that. So there were some fairly stressful organizational changes that we did, going through that. The team all came together, so I think there was a lot of camaraderie in it. So I won't claim to be the ‘father’ - I was brought in, you know, as the instigator and the chief nudge, but part architect part transformational leader. That was fun.
When Michael Schumacher won the race and became Formula One Champion, they asked him over radio how he feels. His response: "We did it, Ross you are the best!" Archetypically speaking, his and Kellers' behaviour shows they embody the good king.
Or maybe the other people involved are really, genuinely, just as important as him and there's no king, there's a team.
Its human nature to pick out heros, say they did everything by themselves and idolize them. That doesn't mean it's real. It's just a story you're telling yourself.
I've never heard about the people mentioned here, and I don't know anything about semiconductor manufacturing. Except that it's one of the absolute most complicated things humans have ever attempted and there's no way a single person could be the creator of an entire modern processor architecture. So I've been reading through this thread kind of surprised to see people saying this one person did so. But your comment made me realize it's just typical hero worship.
Note: this doesn't belittle the work of anyone. Schumacher is an amazing driver, one of the best in the world. But saying he is responsible for winning formula one races belittles the work of the other people involved, engineers at the top of their game just as much as he is, and yet faceless to most people. He could be exactly the driver he is, or even a hundred times better, and without equally talented engineers behind him he'd still finish in last place. So did he win the races, or did they? Obviously, the answer is that they won together.
I think the point being made above is that good leaders consistently and forcefully resist the invitation to take personal credit for accomplishments that is inevitably and repeatedly thrust upon them by the public eye.
Probably the most interesting nugget is that Zen2 was actually considered a tweak internally and not a full uarch revision, while Zen3 is actually the clean-sheet revision. But it's just a good summary of the general tone and tempo of CPU development I think.
Well yes, this is exactly why I say he is wise. And also surprisingly humble. But if you listen to the whole story, how the architecture came to be, scratching reasons not to do it one by one etc., the picture seems pretty clear.
I’d be inclined to take the man at his word humility or not.
If he says it was a team effort, especially if he’s speaking clearly about the different aspects and who was responsible for them, I wouldn’t disagree with him. He would know after all.
I would too, but this shows something to me with respect to his ability to launch a new fab: he knows what it takes to build and work with a team of teams, all with specialist knowledge. That’s something far more important than his ability to do everything himself, which is completely irrelevant at this scale.
There are few human endeavours that aren’t team efforts. Even if we aren’t collaborating day to day, we still stand on the shoulders of giants all the time.
Basically all high-end CPU's in the last 20 years(almost 30 actually, it started with Pentium Pro in 1995) have small internal instruction set to which all of the instructions that are transmitted to CPU are converted. Part that converts instructions to microops is called frontend. There was canceled sister-architecture to Zen called K12, which, as far as I can tell, was basically just Zen with different frontend.
Others in the thread below have mentioned Jim Keller worked on DEC Alpha. DEC sued Intel for some patents related to Pentium, so even this bit of the history relates back to Jim somewhat.
Jim Keller is tied to basically every ground-breaking architecture of the last 30 years. Not that everything he's worked on has been a hit, but, if something ended up being a huge success there's a high probability Jim Keller has been skulking around somewhere. Conroe/Sandy Bridge are the only ones that come to mind that haven't been him.
His greatest hits: DEC Alpha, AMD K7 (Thunderbird/Athlon), AMD K8 (Athlon 64), Apple A5, AMD Zen... probably others I'm forgetting.
Like god damn can someone else in the industry have some ideas of their own please, lol /s
And again not that other people aren't involved either, in particular Mike Clark was really the guy who executed Zen development, but Keller was there at least as an advisor for a lot of the early development. He's one of Zen's uncles, Clark is the father. https://www.youtube.com/watch?v=3vyNzgOP5yw
I think it also says a lot that he completely fucking bailed from Intel after only being there a couple months... I think he saw they just weren't ready/willing to execute well and his time would be wasted there. He's the wandering silicon samurai who drops some golden nuggets of advice and wanders off into the sunset, not a babysitter while you purge middle-managers playing office politics.
(He left due to a "family situation" and while I have no doubt that it was real... I also think he probably might have stayed if Intel wasn't a complete dumpster fire too.)
Right. Back when Keller and Dobberpuhl were at P.A. Semi, I was waiting for them to IPO so I could invest... very low power 64-bit dual-core PPC chips from a few of the key Alpha people seemed just what Apple needed to solve their problems with IBM not being able to get the G5 power consumption down low enough for laptops.
I was disappointed when Apple switched to x86 instead, and a few years later, Apple acquired P.A. Semi, which I believe became the bulk of Apple's mobile processor team.
If rumor is to be believed, it really is too bad that Ken Olsen refused to drop margins and increase volume on the Alpha in order to create a scaled down version for Apple back when Apple was looking to leave the m68k architecture.
even if he's not the determining factor of success - as he clearly said it's a team effort - he's very likely a safe bet, and a good canary, a good advisor (not a yes-man).
Not quite. When you have a front end that spits an instruction into micro-ops that's a fixed mapping of 1 to 1-3 instruction and your decoder can handle many of those every clock. When you a micro-coded instruction you can have that instruction turn into a whole subroutine of indefinite length and on most current designs you can only decode one at once and it takes over until it completes.
I'm interested to hear about this as well. I might be confabulating a bit, but just a bit, but I remember hearing Keller say that Zen is structured that way that there's nothing stopping them creating a different ISA frontend and say putting out an Arm processor instead of X64. That it actually wouldn't be much.
No, AMD’s Zen had a sister uArch that was ARM-based[1]. They were originally targeting it for servers, but ended up releasing only one or two fairly bog standard ARM cores before abandoning the venture.
> Long story short if I had money that would be relevant in this context I would invest really hard into Tenstorrent.
I would happily accept a bet (e.g. $100 bucks or a nice bottle of whiskey) from you (or anyone) that atomicsemi is successful. I will just bet against it because I am skeptical that a new company is successful in a capital intensive segment.
It is probably slightly insulting, foolish, and arrogant to bet against the famous Jim Killer on a website dedicated to startups. This should not be a dig against Sam and Jim Killer. I guess that will try something truly innovative. But if I look a the past, the odds seem to be stacked against them.
I bought an Alpha for a project that needed a lot of directly addressable memory, it was the first 64 bit architecture that was affordable and ran RedHat on it. That box paid for itself within the first week.
It was also the first (non-research) processor I'm aware of that was designed from the ground up to be 64-bit, without 32-bit addressing. Of course, as long as you can get the OS to allocate only within a given 4 GB range, you could emulate 32-bit pointers by storing only 32-bit offsets.
The processor's firmware (PALCode) was essentially a single-tenant hypervisor, and the OS kernel made upcalls to the firmware in order to perform any privileged instructions. Had the architecture survived longer, this would have been handy for virtualization. Modern OS kernels have special cases for upcalls when running on top of hypervisors in order to avoid some of the overhead of the trap-and-emulate code in the hypervisor.
The designers were brutal in only including instructions that could show a performance improvement in simulations. The first versions of the processor didn't have single byte loads or stores, presuming that the standard library string functions would load and store 64-bit words at a time and perform any necessary bit manipulations in registers. They later relented and included an instruction set extension for single-byte operations.
They were also famously brutal in their memory model, leaving as much leeway as possible for hardware to re-order operations. As long as you're correctly using mutexes to protect shared state, the mutex acquisition and releasing code will properly synchronize all of your memory operations. However, if you're implementing lockfree data structures, the Alpha is particularly liberal in its read ordering, and you need read fences on the reader side of lockfree structures, which is unusual. Experience has shown that for most code, the potential performance improvements aren't very significant, especially considering the increased potential for concurrency bugs.
I loved that box. It worked for many years and when we finally shut it down it really felt like the end of an era. This was when I emigrated to Canada where I stayed until 2007, it would have been nice to take it along but we were shipping enough stuff across the Atlantic as it was.
I'm pretty sure that if you had dropped that from the 10th floor of a random office building you'd be fined for damage to the pavement but that machine would have still worked ;) It also took two people to lift it.
I had a couple of Alphas in my home lab up until about 9-10 years ago (a largish DEC3000, a 'generic' 164PC, a DS20). Even as elderly boxes they were astonishingly well built, performant enough to do real work on, and gave a useful 'not an x86' check when I was testing for portability and such. However, I figured out they used a significant fraction of all the power consumed in the lab and generated heat like furnaces. When I did a tech refresh regrettably they needed to go. The 3000 in particular seemed like it was designed to go into combat.
Ah yes, the power consumption... don't get me started on that one. That box alone probably took more power than whatever else was living in that rack :)
And now your average phone has more CPU power and more storage...
I switched to 100% solar power here a few months ago and have powered down all of the more beefy stuff, power consumption went from 30 KWh / day to < 10...
To be fair, much potential of such a memory model (notably, where the data dependency of pointer-chasing (and anything else where an earlier load is used to compute a later load's address) doesn't force a matching order of when the load instructions execute (pull the data out of cache and into a register)) is only unlocked once you have substantial contention (or some contention plus latency between cache and actual memory, like if you go across PCIe/some-other-NUMA-fabric, or access Optane memory) along with load address speculation (this can be done automatically or through explicit prefetch instructions).
In the absence of a read fence, you can speculate the address of a second load, execute this second load in parallel with a first load, and ignore a cache invalidation (or just not wait until you can rule out an asynchronously transmitted one) hitting the second load so long as the address (computed from the first load's data) was correctly predicted.
It hits even harder when the predicted address was a cache hit and the first load experienced a cache miss, because now you can speculate execution using the second load's data (delivered from the cache) and retain/confirm/retire the results of the speculated computation as soon as the first load's data returns and the computation of the second load's address confirms the speculated one.
An example is read-only access to data structures with pointer chasing while a different core performs copying garbage collection. Because the data is (semantically) read-only, the old copy and the new copy are both equally valid, and as long as you don't accidentally read the new space before the copy was written into it, you can pointer-chase freely through these structures reading the next e.g. linked-list entry from either the old or the new place (if you speculate correctly).
Critically, this could get by with invalidating only cached data for the copy target range, ensuring readers don't get the uninitialized data, without invalidating their cache of the copy source range. Of course that would require sufficiently targeted invalidation.
Other cases like e.g. typical union-find / disjoint-set datastructures work just fine with standard fence-free Alpha memory accesses, at the slight cost of `union` operations not coherently affecting outcomes of `find` operations. That's often not a problem, though, as parallel applications already have to cope with the `union` racing the "subsequent" `find` operations (and ending up with the `union` happening last).
One thing that I found pretty interesting is handling of exceptions, you could basically delay dealing with them and then at the end of a block check if anything had happened.
I not a architect and don't know enough about the topic, but I thought that might be something interesting for RISC-V. Love to read about the advantages and disadvantages of that.
It's quite simple, actually: a customer needed a very large database (> 4G) and one proposal was to create some kind of sharding mechanism because all of the ways that they could think of to do this in RAM meant that they had to use a pretty large number of machines, complicating all of the ways in which updates, queries and keeping it all synchronized would have to be done, besides requiring a rack full of hardware.
The Alpha made all of that moot because in one fell stroke it increased the amount of RAM that could be addressed directly to the point where the whole thing could happen in memory without any cluster communications overhead. It was still an expensive machine but it cost a fraction of the setup that it replaced, and performed really very well. A nice example of how vertical scaling can be a very viable option. The 64 bit file system also allowed for much larger files, which helped the project in different ways.
One downside was that spare hardware was difficult to obtain but the system was built like a tank and ran for many years until there were many other suppliers of 64 bit systems.
It was way ahead of the anything else in the 'affordable' range of computers. Though it still cost as much as a nice car fully decked out, especially the RAM was quite expensive.
Having a 64 bit system at that time could have also really helped with implementing super fast virtual machines. You have so much space to store information in pointers.
Azul later realized some of these things on Java. Building a virtual machine and even language to take advantage of that from the ground up would have been cool.
That'd be cool. The skeptic in me feels like this is the electronics equivalent of when someone who directed a famous MMO starts up a new game company with some other famous cool game designer/developer that everyone loves, and then they take in a whole bunch of money and never actually do anything (not that their fans would ever admit it).
Different spaces, different people, but I'm going to sit back and observe I think. I'm just kind of done being excited by people announcing the partnerships, once the partnership bears fruit I will have a look and then decide if I'm excited.
Curious if there are practical reasons to have a semiconductor company in the SF Bay Area va anywhere else at this point.
I know about the history of semiconductors in the SV but so much has changed in the world since then.
Does being located here these days actually being practical advantages to running that kind of business, like having a greater talent pool? Is it more of a branding/identity thing?
Not sure about a semiconductor company in the SF Bay Area but when it comes to fabs my (possibly controversial) take on this that there are already plenty of superfund sites from the old times there and it's probably better to reuse them instead of contaminating more land.
I'm also not sure if we will really see a manufacturing renaissance but if we do I bet it will go hand in hand with a lowering in environmental standards.
If your target customers are tech companies in the Bay Area it might make sense. I expect ASIC adoption to increase in datacenters over the next couple decades, for example, so if I were targeting that market I’d probably start in the Bay Area (given that Seattle has less history of semi manufacturing).
You have Lam, Applied, and KLA in SV which are 3/5 of the largest equipment suppliers. Along with that the entire supply chain and support. You also have most of your likely customers nearby.
So it remains expensive and still a great place to start.
"Fab" seems a bit of an overstatement. What he is describing is similar in capabilities to what any average university lab with a device fabrication facility is capable of.
A bit less, e-beam has significant limitations in the amount of area that you can cover at once (and is much slower than EUV). Not really suitable for anything besides prototypes and very small series. But once you have a prototype and it works you have the level of validation required to throw real money at a project.
I’m pretty sure that’s what they will do initially and once they get proof of concept they will seek funding for large scale. They will need a huge investment to get going on any sort of scale.
When trying to research the company (should've clicked on the HN link really) I first found https://atom-semiconductor.com/ - I wonder if they're going to struggle with the name / trademarks considering there's an established semiconductor company only 2 letters apart (atom semi vs atomic semi).
Great to see an apparently US-based fab opening though (I'm not from the US, but anything not based in China seems to be good for the industry)
> Our total compensation package also includes generous equity in Atomic Semi.
That could well turn out to exceed the regular wages. Besides, working with a small world-class team is a great learning experience and big reward in itself.
I'm glad to see the day has come where I see an advert for a "full-stack" hardware engineer.
I don't know anything about wages in SF but the requirements are niche to say the least, and all wrapped in "2+ years of industry experience". Good luck to them.
That jumped out at me too. They want an experienced "full-stack" ME/EE with strong knowledge in multiple areas of physics for a very speculative attempt to essentially create a new kind of manufacturing by custom-building all the hardware, and their salary range starts is $100k-170k in San Francisco? That seems a bit low for a unicorn candidate, especially given what they're going to have to spend on the hardware itself.
The people they'll attract with that job description are already wealthy, likely semi-retired and looking for something to do. They won't care about the salary. They will cause headaches for the bookkeepers by forgetting to deposit their paychecks.
If this leads to a startup with a parking lot full of Porsches, well, that's not such a great sign, having BTDT. You want lean and hungry people at a lean and hungry company. People with something at stake -- if only their pride -- and something to prove. Not people who were "10x performers" at their former startup and are now, post-exit, convinced of their own intellectual immortality.
IMO they're better off hiring engineers like everybody else does, by paying market compensation.
Assuming 170k is what they’d pay people who actually meet all those requirements and 100k is for new grads (eg someone with an MS or PhD that exposed them to the “full stack”), it seems reasonable as a salary, considering they are paying equity as well, and may include bonuses.
170k is a pretty normal senior SWE salary for Bay Area startups or even FAANGS. The crazy TCs come from equity and, to a lesser extent, bonuses.
I've had two jobs at pre-Series A startups in Houston with salaries near the low end of that range. A higher-profile, more capital-intensive SV startup looking for someone with a very broad skillset ought to be able to pay more, shouldn't it?
Thats a better and more nuanced opinion than my own - I went from EE to CE because I wanted an easier path to working as a SWE after I heard about my prospects at Intel or Boeing or really as a hardware engineer in general.
Question though - if your salary was at the low end of that range why does this seem too low? What would seem like enough? Im assuming those jobs were in related fields and that you would have a stronger impression of what would be attractive for a person in that position than I would
I'm more of a firmware guy, myself, although I've done some PCB design and a fair bit of hardware debugging. My startup experience has been in the oil industry at companies with fewer than 5 people.
I don't have actual data backing me up. It seems low to me because I have gotten the following impressions, some of which may be wrong:
* San Francisco is one of the most expensive cities in the world.
* Startups in Silicon Valley routinely receive tens of millions of dollars in funding despite having questionable business plans, conflicts of interest, and mentally-unstable founders.
* Even SWEs who just graduated from college can get upwards of $150k/year at major tech companies, even if they spend most of their time just copying and pasting from Stack Overflow.
* Good electronics engineers are rarer than good SWEs. Good electronics engineers who can also design high-precision machines are extremely rare, and the ones with semiconductor experience are probably earning big salaries at large companies.
* Hardware development is much more expensive than software development. In particular, hardware design errors can be much more expensive to fix.
* Getting a new manufacturing process up and running is a lengthy and capital-intensive task. Creating a totally new kind of manufacturing process to "disrupt" an established, competitive industry could charitably be described as "high-risk".
* Hardware startups rarely make it big, and the ones that do seem to be associated with trendy software stuff like crypto, AI, and quantum computing.
To summarize, a startup that's going to be spending a lot of money on hardware development in a very expensive city wants an experienced engineer with a specific and unusually-large skillset, and they're offering a salary that barely competes with an entry-level SWE job along with some high-risk stock options.
As another comment said, it sounds like they want someone who will treat this as more of a hobby than a job. Which is fine, but I feel like the proper job title for that is "independently-wealthy cofounder", not "hardware engineer". (Alternately, I've heard that highly-specific job offers can be used as an excuse to get an H-1B visa -- maybe they're looking to hire someone from Taiwan. Or maybe I'm just being cynical.)
For the skillset being asked, if you ignore the equity component (as I'm guessing GP did), $130k is absurdly low anywhere in the US. Folks in many metro areas (Portland, Austin, Phoenix, etc) with the desired XP and skillset can easily make $250k cash salary + $50k highly liquid public equity at major companies (Intel, Samsung, Globalfoundries, etc).
Anyone with that XP concretely applying those skills at most commercial fabs is easily creating millions of dollars of value (as a portion attributable to their contribution to the overall fab's value creation); if at all, it's absolutely insane that they get compensated so little on average.
Everything I know about Sam suggests to me he’s never been in this for the money. He just thinks chip fabbing is fun.
I think he’s got enough of a twitter following that he might be able to attract a crew who has the same recreational attitude towards semiconductor fabrication.
Semiconductor manufacturing thrives on econmy of scale. It would be very appealing to disrupt that field by enabling cost effective manufacturing of smaller volumes as well. This is especially true for customized technologies and devices.
I wonder whether this is similar to what Sam Zeloof has in mind? I went through the website, job postings, twitter threads etc. but I am not sure what they are actually trying to do.
There used to be a crazy startup with a simiilar business approach in mind that tried to manufacture devices in small spheres of silicon: ballsemi. Their website seems to be defunct, but there is an article here:
In particular it would be interesting if they can make their manufacturing more accessible. Want to fab a chip on TSMC? You'll either need to find another company to contract/partner with who has an existing relationship with TSMC and can do all the back-end implementation work for you or setup such a relationship yourself, including onerous NDAs so you can get PDK access (that's assuming you can manage to get a conversation going with them in the first place).
It'd be fabulous if atomic went the Open PDK route like Skywater 130nm: https://skywater-pdk.readthedocs.io/en/main/ but even a setup with a 'here's the fab price list click here to pay us $x'000k and sign our NDA to get access' would be a big step forward. Combined with reduced manufacturing costs it would make it far easier for smaller players to experiment with custom silicon.
>t'd be fabulous if atomic went the Open PDK route like Skywater 130nm: https://skywater-pdk.readthedocs.io/en/main/ but even a setup with a 'here's the fab price list click here to pay us $x'000k and sign our NDA to get access' would be a big step forward. Combined with reduced manufacturing costs it would make it far easier for smaller players to experiment with custom silicon.
Yeah, but the business model of providing accessible standard CMOS technology would not really require building new tools, wouldn't it? I think the appeal would be in being able to build devices in highly customized technology.This would require fast learning cycles at low cost. Not really something a MPW helps with.
Well if your view is the barriers to entry to standard CMOS technology are far too high and it's possible to do things more cheaply and openly, (e.g. using cheap equipment source from China as per Zeloof: https://mobile.twitter.com/szeloof/status/154993704406717235...) then you might conclude some custom tooling to be a key part of the puzzle to help tie everything together you can get off the shelf (or in-time replace off the shelf tooling to get a neater solution).
I mention skywater 130nm not because of the MPW program but because it's an open PDK, you can just download it and go use it, without needing to pay anything or sign any legal agreements. Maybe Atomic would go the same way?
His claim to fame is making small ICs in his home garage while still a teenager, fairly recently. That's no small achievement.
(He seems to have had access to the kind of garage and equipment other engineering kids can only dream of though. I know someone who really wanted to do the same thing a few years earlier, who had the knowledge but couldn't get themselves a suitable place and budget to do it.)
As an observer in this space, this feels somewhat significant.
Jim is, from what I can tell, an extremely good leader, and Sam appears to have limitless energy for experimentation and limitless passion for this field.
I hope they not only succeed with their fab, but that in some way they can make at-home (or otherwise small-scale) prototype chip manufacturing easier, less expensive, and with fewer barriers to entry. Given the physics involved I don't know how much they can succeed at that, and I don't even know if that is a goal of theirs, but it would be nice to see, certainly.
These folks deserve success and I'm sure they will attain it, no matter what their goals are, if their fate is in their own hands.
I mean, he is either trolling a bit or has a very simplistic view of buying niche equipment from Alibaba. It's far from click and collect (eg. pricing, export/import/transport) and that's even before you get up and running.
Either way, from my experience (with a lot more simpler stuff), the thread does not represent the reality of procuring stuff from Alibaba.
I read the Alibaba part as hyperbole. I don’t think he was talking about actually getting TEUs of this stuff through customs—just making a point on pricing.
one of my points is, that I do not trust that pricing. In my experience the pricing ranges on Alibaba were very wild and often used as an incentive for you to contact them. And after some back and forth you got a 'real' offer. Taobao (or aliexpress in the west) on the other hand, usually contained 'real' prices. Can't say for sure for that listing(s), but I would take everything on that links with a grain of salt.
> one of my points is, that I do not trust that pricing. In my experience the pricing ranges on Alibaba were very wild and often used as an incentive for you to contact them. And after some back and forth you got a 'real' offer.
In a similar vein:
> But I do note that if you run the structure through SciFinder, it comes out with a most unexpected icon that indicates a commercial supplier. That would be the Hangzhou Sage Chemical Company. They offer it in 100g, 500g, and 1 kilo amounts, which is interesting, because I don't think a kilo of dioxygen difluoride has ever existed.
Isn't the point to demonstrate that this process is possible with inexpensive, commodity, off-the-shelf parts. And if so then there is viability with solution.
I doubt he's talking about building a company that sources their machinery/parts/gadgets directly from places like Alibaba, Taobao, etc...
The difference seems to be scale in volume. This equipment looks like it has a low output volume, whereas the ASML machines ($$$$) can be used to churn out high volumes.
I thought Jim Keller was more a CPU design guy. Like, I could imagine him starting a fabless CPU or GPU design company. I gotta admit, a fab is definitely an interesting and rather-unexpected direction for him go down...
Yeah, I feel like I'm missing something here, but in the other thread the guy says you can get everything you need for 5nm from Alibaba, but afaik to get to such small transistors (5nm) you need EUV lithography, and is there anyone other than ASML manufacturing those machines?
Lithography isn't the only way to construct transistor, it is possible to to construct much smaller transistors using alternative tech. Not in scale, though
To be fair I don't think he walked away from either Tesla or Intel thinking "job done" and not sure how this new role is going to interact with his position at Tenstorrent especially since he's literally just moved to the CEO role.
Maybe I am dense, but where is Atomic Semi saying that it is being started by Sam Zeloof and Jim Keller, especially if the title is saying that (which I thought titles weren't supposed to be editorialized)?
Makes a lot of sense to me. Intel's production cannot keep up and so it's only TSMC and Samsung with a big gap. I think some people will be surprised when TSMC starts taking more and more of the profit from the value chain in the next few years.
Just rent or sublease an old one. Clean room + water supply + waste processing facilities are not that uncommon. They have the most famous chip designer in living memory as their founder, finding a fab should be the least of their concerns.
I had never heard of Zeloof. Wikipedia calls him an autodidact but I guess he chose to go to college (CMU). This Wired feature gives a bit more color (mostly just wowing over his home lab):
Buying equipment is one step in a very long, difficult, and expensive journey towards creating and maintaining a reliable and economical manufacturing process. Having worked in semiconductor process development, I can tell you that the people there do a lot more than "Make power points and fuck around"[1].
Doesn't a new semiconductor fab requires billions of dollars to build? How they will organize mass production? Or maybe they will manufacture only small batches of custom chips?
Speaking for a friend. Yes there is, but quietly and slowly bootstrapped without publicity. Life events got in my friend's way so it's just as well they didn't take funding those years ago. VC in the UK is a pale shadow of VC in San Francisco anyway.
Very similar goals, including fab suitable for low-NRE open source hardware design iterations, but grander vision: making more of the hardware tools as well instead of purchasing them, building a stack that is not silicon-specific, extensive use of physics simulation and feedback to design specialised toolchains. That is partly because bootstrap finance requires inventing cheaper tooling than is available on the market, not just cheaper lithography.
Atomic Semi will almost certainly get their faster due to funding, better network, more pragmatism, track record of the people involved, and location. But friend's thing might go further into advanced and novel capabilities, eventually, if they continue with it.
Interestingly, their original goal was to stimulate a culture of open source hardware at the lower levels by providing a service to fabricate new designs at low cost, the way that has already happened with software, i.e. anyone can learn it in their bedroom and take it as far as their skills permit, causing a significant change of culture and knowledge sharing. That goal doesn't seem to be needed any more because of the Efabless-Skywater-Google collaboration making hobby-level ASIC fabrication available for free, and the rapid increase in available open source hardware design tools at steadily higher quality. And things like Atomic Semi emerging. The culture has already changed.
Honestly this my favorite type of startup: a small company trying to do something insane, previously restricted to very large and resource-rich companies. Actually disruptive.
Jim Keller is still CEO of Tenstorrent right? If so then is this Jim vertically integrating in a way? I don't know much about Grayskull tbh, but this would make sense.
> Honestly this my favorite type of startup: a small company trying to do something insane, previously restricted to very large and resource-rich companies. Actually disruptive.
Well, there's comma.ai [0], which is very rare from a company I have seen on what they have as a small startup.
Something that managed to be more safer than Tesla FSD (Fools Self Driving), and not some vapourware like Zoox, Drive.ai, Lyft, and Uber's self driving ambitions which all have amounted to very large costly contraptions that went on the road to no-where.
The main problem is e-beam litho is extremely slow. It might take ~1 day to do a single photolithography step for a 1x1cm chip, whereas an EUV machine can pattern a 300mm diameter silicon wafer in < 1 minute. (The next problem is making everything reliable. Billions of transistors (a modern CPU) needs a failure rate per transistor of better than 1e-9.)
Maybe that's enough for extremely-low-volume production?
[1] https://mobile.twitter.com/szeloof/status/154993704406717235...