Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
TSMC Nanke 14 Factory Production Interruption Could Affect NVIDIA and Others (hardocp.com)
133 points by IMTDb on Jan 28, 2019 | hide | past | favorite | 84 comments


Let me tell you a story about sulfuric acid. When I was working process integration at a fab, I had a long series of meetings about qualifying a new source of sulfuric acid. Not a new supplier even, nothing that risky. Just a new source, meaning the iso (tanker truck) full of sulfuric comes from a different plant owned by the same chemical company.

What it comes down to is that it's basically impossible from a process integration perspective. Sure, you can take assays, and you do. But what about the things you aren't looking for in your assays? All you can do is plug it in, cross your fingers, and look at the yield trends. Not only does sulfuric mix in the supply lines, it gets used throughout the entire process. So if you do have some kind of strange contaminant that you didn't test for, maybe it only actually hurts you pre-gate. And the damage ramps up slowly and unevenly. But if your yield tanks, then you have to scramble to figure out what happened, and sulfuric isn't even your first suspect. (Because there are thousands of things like this that can tank your yields.)

What's worse, you're literally over a barrel. Sometimes Plant A has to shut down and can't send you another iso. Do you take an iso from (unqualified) Plant B? Do you crack open a barrel of sulfuric? You have a stock of qualified sulfuric in barrels, for situations like this, but in practice barrels are worse than isos. Metal everywhere.

I'm not saying it's sulfuric in this case. Lots of chemicals have a story like this one, but it would also not surprise me one little bit.


This is absolutely fascinating. Do you have any recommendations for books/videos/articles that shed more light on some of the interesting minutiae of cutting edge fab processes?


I'm afraid not -- I learned what I know through direct experience. That being said, I haven't really gone out there and looked for resources since I got out of the wafer business.

Also, "cutting-edge" made me smile a little bit; this particular issue was mostly about plumbing :)


Wouldn't using multiple Spectrometry/Spectroscopy methods help in finding out the impurities in the process chemicals?

ICP-MS and NMR should cover most elemental impurities.


I don't have any experience working in that industry.

But based on what I know, modern ICs require exceptional levels of purity of the materials. A single atom can ruin a transistor, and therefore the complete chip.

AFAIK the methods you mentioned aren't sensitive enough to detect individual atoms in barrels of stuff. E.g. ICP-MS detects 1E-15 concentrations, but in absolute numbers, 1E15 molecules is only 1.66e-9 mole, e.g. for iron (55.8g/mole), it translates to 1E+10 defects per kg of stuff. Way too many.


> and therefore the complete chip.

Most semiconductor companies bin chips based on how broken they are. GPUs, for example, frequently have different skus with the same chip differentiated by how many processors are broken (and thus disabled).


Wouldn't you be able to run the test multiple times and increase your sensitivity to an arbitrary level? Too expensive?


Studied a fair bit of chemistry, but not a pro, so this is only a rough outline.

A lot of chemical test equipment use semiconductor sensors of some kind - optical or otherwise - and most likely the sensitivity limit is simply the noise floor of the sensors. In good instruments they tend use good, or amazing detectors but they are still operating at the noise floor, or sometimes even below.

So running tests again doesn't really help sensitivity, for that you would most likely have to use some chemical process that amplifies the effect of the contaminants you are looking for. Hence the problem of having to know what to lool for in validating new chemicals, if you need ultra pure chemicals.


Elemental impurites aren't what we were worried about. It was organics. How sure are you there's no weird bacterium that lives in sulfuric acid and likes to eat iso tank lining? Would you bet $100 million against it?


Why don’t they have in house labs for creating all their chemicals? Seems like the investment would pay off.


- Those labs would run into the same sort of problem. It’s not like the supplier is contaminating on purpose. In-house would also produce less, making the staff less experienced

- Chemicals aren’t created from thin air (although that is one ingredient). You will still have products coming in, only now you are one step closer to, say, raw oil, which has far more natural variation than the refined product you previously bought

= the secret to, like, half our wealth is specialization. Doing complicated stuff as a side hustle in-house just isn’t our economy works.


We're not talking about a couple of people in white coats here. It takes a full-on chemical plant to supply enough sulfuric to satisfy a fab, and unless you're going to throw a ton of stuff away, this plant is going to produce a lot of other chemicals that are of no interest to the fab. That's not to say a vertically integrated conglomerate couldn't own both a fab and a chemical plant -- this is the rule rather than the exception in Japan and Korea -- but these are two separate industrial concerns.


Every once in awhile, one hears about some accident like this one at a single, physical factory, followed by a laundry list of tech companies that are adversely effected. It powerfully demonstrates how awfully centralized microchip production is; individual plants can account for substantial fractions (or even majorities) of entire product categories, which means that individual incidents can have outsized effects. I don't know if that sort of risk has been fully internalized by the tech sector.

For that matter, how specialized is the market for whichever high-purity chemicals were behind this incident? What portion of the semiconductor industry was purchasing from the same supplier? We may not have heard the last of this.


It's not just the tech industry that is concentrated like that. I think more and more industries have these insane levels of concentration.

In 2016, an explosion at a single industrial gas facility caused a whipped cream shortage in the US. [1] I mean in the scheme of world problems, that's one we'll survive, but it does illustrate how vulnerable a lot of our supply chains are.

In 2007, the shutdown of the Canadian Chalk River reactor caused worldwide shortages of isotopes used for medical imaging, with real life consequences for patients. [2]

There are probably countless other examples. In the end, its the direct consequence of the constant quest to lower production costs and increase efficiencies.

[1] https://www.theatlantic.com/science/archive/2016/12/the-dead...

[2] https://www.nytimes.com/2007/12/06/business/worldbusiness/06...


Redundancy just isn’t rewarded by the market, yet another form of market failure.


I think more often than not it is... not sure where you would get that idea from.


Maybe the current article or similar situations where a single point of failure leads to huge delays, financial losses, bottlenecks or other issues for any nurter of companies that are reliant on this one point. It's a problem that's increasingly affecting many industries, where the amount of suppliers capable of providing certain parts is in the low single digits or even the source of a certain material can be traced to a single place. EUVL is a great example. If anything were to happen to ASML, 7nm would be dead in the water for a long time.


The increasing trends of the world to be dependent on a single source fo things... Must be something pushing the trend.


It depends on the likelihood of shortage. Some industries are better at this than others: automotive do multiple vendors for everything (most easily evident are tires: the same model could be supplied with tires from two or three different vendors).

About a decade ago I visited a European fab of one of the European semiconductor manufacturers, where they told me they are also manufacturing for an American semiconductor firm (which had its own network of fabs). The reason was insistence from auto companies to diversify the supply chain, getting the same processor from multiple fabs in different continents.


> I don't know if that sort of risk has been fully internalized by the tech sector.

It’s known and understood completely within any silicon company. If you can afford it, and it’s technically possible (no special deals or tech), you’ll require at least some functionality at a second fab. What you cannot do is immediately get a second fab up and running. There’s a surprising amount of process tuning, sometimes accommodating new device models, and design rule differences (they will not guarantee or sometimes even start if they’re not meet). It’s in no way turn-key if you’re anywhere near the limits of what the fab can do.


> As shrinking becomes more complex, requiring more capital, expertise, and resources, the number of companies capable of providing leading edge fabrication has been steadily dropping. As of 2018, only three companies are now capable of fabricating integrated circuits on the most cutting edge process: Intel, Samsung, and TSMC.

https://en.wikichip.org/wiki/technology_node#Leading_edge_tr...


I was just thinking about this while reading about a (the?) company that produces extreme ultraviolet lithography scanners. Intel, Samsung, and TSMC all buy EUL scanners from this one company, ASML, to make their ~7nm chips. Only 31 of these machines exist in the world, with another 30 orders projected for this year.

https://www.anandtech.com/show/13904/asml-to-ship-30-euv-sca...


At 100+ mill a pop not cheap..


I think the problem is that cutting edge chip fabs are insanely expensive. Like $10-$20 billion dollar range. It's also ridden with specialty, so the risk of (very) expensive set backs and fuck ups is much higher for new entrants to the market.

The barrier to entry is massive, even for mega corporations.


> Like $10-$20 billion dollar range.

And over 100B for an entire fab complex, if you want to have even a remote hope of attaining real economies of scale.

This is why I predict that this industry will go the way of jet engine manufacturing, without any hope of any new competitor coming.


This is right on the back of nVidia's stock taking a beating after admitting that the global market for GPUs has become soft from the collapse of the crypto bubble.


You mean like AWS being down?


Or a bad BGP route being broadcasted, whether intentionally or not:

https://bgpmon.net/bgp-leak-causing-internet-outages-in-japa...


Not even- I don’t know the relevant numbers, but I’d bet there are many more AWS regions than factories making the same kind of chips as the one in the article.


I love how Moore's law is a complete marketing gimmick at this point:

- Intel not only hasn't released 10nm chips yet, they even stopped publicizing transistor count of their 14nm, the node from five years ago.

- Not to mention their i7 chips node over node being marginal improvements for the past 10 years or so.

- And all of a sudden, Intel's competitors seem to be right on track with 7nm, with the Moore's law giant itself, and with most resources, is struggling at 10nm.


Do TSMC or their customers have insurance for this sort of thing? It would seem prudent given the stakes.


Should we expect an increased cost as consumers? I remember when Hynix foundry had a fire, ram prices went up a lot


I believe so, yes. Supply and Demand.


Out of the GPU mining fire and into the incompetent fab frying pan.


This does not bode well as well for Nvidia who already had a hard Q4 2018. They expecs to report $2.2 billion in fourth quarter revenue versus the previous guidance of $2.7 billion.


I know it's wrong of me, but I can't help but be a bit relieved with the recent bad news for Nvidia. This time last year (and the year before that) it was looking like they were about to have a total monopoly on the GPU market. This will give AMD time to catch up a bit and Intel a good market to release their own GPUs.

With Intel's CPU dominance being withered by AMD and ARM solutions, and Nvidia's own dominance being thrown into question, the hardware industry is getting more interesting all the time.


I like seeing underdogs create something better, but a failure like this just holds everything back.

Prices will be higher, and the need to compete on quality/performance will be lower.


THey can't move the new products according to this morning talks, so they might benefit from a shortage and they have a history of shady tactics like GPP to muzzle bad reviews.


Bad reviews aren't their problem now, it's that people won't pay an extra $400 for incrementally better performance and a missing-in-action feature set.


I would pay it, and I did pay it. I need a bunch for on-prem Deep Learning. My test rig of 4 RTX 2080tis artifacted after about a week of use. I'd like to buy them by the handful but I've not been able to source a reliable supply. Given this new it looks like I may have to wait a lot longer.


I'm in the middle of a similar build, and I'm curious how you got 4 of those to work well right next to each other. Are they FE cards, or some other variant?


I planned on water cooling but I never made it that far. They were FE cards. I do miss blower style but I understand they're hitting limits with it.


Yeah, custom loops seem like an ongoing chore, I was tempted and then gave up on that idea. 1000W is no joke to dissipate, either.

Interesting that FEs are working, I had heard that that configuration was problematic. Have you seen much thermal throttling with them butting up against each other? I was going to resign myself to one blower 1080Ti in slot 1 and then a 2080Ti in 2 and 4, but maybe I don't have to :-)


And much higher wattage, across the entire lineup.


Oh, that is absolutely disastrous if true. With their tight margins... oof. What can I say? That is ridiculously bad.


Their margins aren't as tight as you might expect. Their last quarter report[1] puts their gross margin at nearly 48% and their operating margin at 37%. That is pretty healthy and suggests they can easily withstand this sort of event.

[1] https://www.tsmc.com/english/investorRelations/quarterly_res...


Don't worry, they have tons of old stock to sell..


Primo Levi "the periodic table" has a couple of chemist stories from his post WWII experiences, which go to this problem. unknown adulterants, effects on process chemistry, the wierd moments in "DONT CHANGE IT" culture. Older times, but similer problem? And, beautifully written.


What are "wafers"?


After a Silicon Crystal with 99.99999999+% purity is grown, they are sliced into large pieces called "Wafers". Transistors are then built on top of the crystal structure. All together, these transistors create chips. Somewhere between 75 to 7500 chips are made per wafer, depending on the size of the design.

Yes, 10+ 9 purity. And the impurities in those crystals were put in there on purpose (to make N-type or P-type Silicon). I never personally understood the chemistry or physics behind the process, but its always cool thinking about how exceptionally pure this whole process is.

In any case, if a run of wafers is bad, that can easily be hundreds-of-thousands of dollars worth of chips per wafer. For whatever reason, it seems like these wafers did not have the 99.99999999% purity needed to successfully make chips, so everyone's chips are ruined. Whatever the issue is, you can rest assured that a ton of business folk are going to be pissed.


> This accident stems from the fact that imported chemical materials do not meet the requirements, resulting in flaws in the wafers produced.

It's possible one of the washes or doping agents was contaminated, rather than the wafers themselves.


If you have time : https://www.youtube.com/watch?v=NGFhc8R_uO4

It's great.


Always flattered when this gets posted.

Always wish I could go back and re-record this talking about 50% as fast.


> Always wish I could go back and re-record this talking about 50% as fast.

I'm listening to this at 1.5x, which is my usual listening speed for non-musical videos. Your speed is fine.

Toastmasters [0] has helped me reduced my use of filler words: ahh, uhm, and, okay, so, you know, like, well, [pregnant pauses], [double clutches]. It's hard for me to listen to politicians anymore, because they're always gumming up their speech with linguistic crutches. I didn't used to notice.

[0] https://www.toastmasters.org/

"When we find ourselves rattled while speaking — whether we’re nervous, distracted, or at a loss for what comes next — it’s easy to lean on filler words. These may give us a moment to collect our thoughts before we press on, and in some cases, they may be useful indicators that the audience should pay special attention to what comes next. But when we start to overuse them, they become crutches — academics call them disfluencies — that diminish our credibility and distract from our message." [1]

[1] https://hbr.org/2018/08/how-to-stop-saying-um-ah-and-you-kno...


Thanks!


Thank you for making and giving the talk. It got me interested in lithography.

Do you have any plans to give an updated talk. It would be interesting to know about what comes after 7nm.


Unfortunately I'm not the guy to do so...I left the industry for a lot of reasons including my own mental health :)


Also thanks for this interesting talk, really appreciate you spent the time to create and share it.

Which industry are you in now?


I ended up a professor at an engineering school. Found a passion for education.


That's great to hear. I am convinced you have talent for teaching. Wish you all the best.


Great talk. Thanks for making it available.

Soooo, it's now 2019.. is EUV working?


It's the underlying material chips are made on: https://en.wikipedia.org/wiki/Wafer_(electronics)


> A wafer, also called a slice or substrate,[1] is a thin slice of semiconductor material, such as a crystalline silicon, used in electronics for the fabrication of integrated circuits and in photovoltaics for conventional, wafer-based solar cells.


Oof, I hope they can absorb the cost :-/


Since it sounds like they didn’t catch it until near the end, it’s probably north of $65k/wafer. Probably single digit billions would be an accurate guess for the total loss to TSMC.


Worth mentioning that this is TSMC's fuckup, not Nvidia or the other companies using their fab.

This is so incredible it makes me wonder if we're looking at industrial sabotage. I can't believe TSMC wouldn't have safeguards in place to prevent contamination under normal circumstances, much less at this scale.


In confidence, I’ve heard stories of simple cost-saving measures at fabs that have resulted in 9+ figures of damages. Honestly, I am impressed at how well these companies keep it out of the news. It’s a shame because the stories are really quite good.

You can have safeguards against contamination but these safeguards aren’t 100% reliable. The article reports “substandard” chemicals and that’s an umbrella term that includes contamination and many other problems.

Speaking as nothing more than a hobbyist, I can tell you that analog photography suffers from many of the same problems you might see in semiconductor manufacturing, only on a much smaller scale. I used to mix my own photochemicals from raw reagents and it’s a complex subject, to say the least. Exposure to air and minerals in the water have all sorts of effects, and the standard way to test your process is just to run film through it. I’m sure that fabs have better testing equipment than I do, but at the end of the day, it’s not feasible to test everything and I’m not surprised that a bad batch of chemicals made it through, ruining many batches of wafers due to the sheer depth of the manufacturing pipeline.

With photochemicals, a small change in the developer formulation can result in what is more or less a completely black and unworkable negative, or possibly a blank negative. I expect semiconductor manufacturing to be similar, since both processes rely so heavily on knowing reaction rates. Kinetics is complicated, to say the least. For photochemistry I rely heavily on using known developer / film concentrations and being borderline religious when it comes to temperature and time.


>In confidence, I’ve heard stories of simple cost-saving measures at fabs that have resulted in 9+ figures of damages. Honestly, I am impressed at how well these companies keep it out of the news. It’s a shame because the stories are really quite good.

Dude at vendor switched a label on two boxes. Parts inside looked identical, and were installed. No way to tell until ~week later when chips hit end of line test and didn't work.

An average process tool (not even a complicated one) probably has a dozen different gasses and chemicals running into it. In a fab it's hundreds likely thousands. Things happen. They all come from vendors around the world, made in batches that vary and rely on negotiated compliance standards and qualifications.

Systems are everything. Wafers travel in boxes 10,000x cleaner than an OR inside a building 1,000x cleaner than an OR...everything matters.


Who benefits from this, though? The first one that comes to mind is Intel... Although I am not suggesting that they did it. It's just a hypothetical question.


Anyone that doesn't use the Nanke 14 factory specifically (or uses the factory only for some of their products).


Or perhaps even a vendor that had contracted more capacity(in Nanke 14) than the present market demands, could benefit from a delay in production.


In theory Samsung or AMD could benefit from shortages of Qualcomm and NVidia products although in practice the magnitude of such substitution would probably be small.


I wonder how deep the pipeline is between inputs and failing tests on the output? I'd guess they'd have quite a bit of inventory in flight before they realized there was a problem with the output, but hell, I don't know.


3 months of inventory are "in flight" being manufactured at any given time, that is the minimum amount of affected inventory. Huge feedback loop, most chip designers can only squeeze in 2 revisions a year due to this.


Surely the cost per wafer is more or less the same no matter where you catch it, since you'll end up with a bubble in your manufacturing pipeline of one unit?


If it's an issue in an early stage of the production pipeline and the same chemical contamination is in place for more than one lot, you eat up to an entire pipeline full of product.


Wafer cost is supposed to be more like $4K although there may be penalties in this case.


$65k is probably with how much lithography and work has gone into the wafer.


Its looking like my $65k figure might be the retail value of the ics in the wafer, the silicon at TSMC is about $4k like wmk said.

Source: https://www.semiwiki.com/forum/attachments/content/attachmen...


No, I mean a finished wafer is $4K.


This could be good for AMD.


AMD apparently uses the PS4 Pro and Xbox One X CPU/GPU are made in this fab according to some other articles about this.


I thought AMD was using TSMC now?


This is about TSMC's "16/12nm process".

AMD is using TSMC's 7nm process and Global Foundries' 12/14nm processes: https://www.extremetech.com/computing/276169-amd-moves-all-7...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: