Ask HN: How do CPUs handle bad transistors?

high_byte · on July 21, 2021

That's a great question with a rather interesting answer.

The difference between CPUs actually may not be due to different designs - but due to manufacturing errors. Intel for example might have a production line only for "tier 1" processors (e.g. i7) but during manufacturing, some of the transistors, for some reason, fail to function. During quality tests they can check which transistors fail and reprogram the microcode to use only the good transistors. Then you end up with a lower tier processor (e.g. i5, i3)

Lots of really good information here: https://www.google.com/amp/s/www.techspot.com/amp/article/18...

cpach · on July 21, 2021

Cool! I’ll take the liberty to post the permalink: https://www.techspot.com/article/1840-how-cpus-are-designed-...

hereforphone · on July 21, 2021

retrac · on July 21, 2021

Modern designs sometimes have some duplication of functional units that allow the final chip to be configured in a workable manner even if a few of the units don't turn out right. But otherwise, getting all of those transistors working right is the big challenge of chip design. Yes, they all have to work. Yields on new processes are often very low, only a fraction of the devices made actually working.

7373737373 · on July 21, 2021

Does anyone know more detailed articles about how this (self-)checking architecture/process works?

retrac · on July 22, 2021

I'm afraid I don't have any articles to direct you to. I can give you a little detail of how they do it with RAM. DRAM is just a huge array of cells. The array itself is made up of repeated sections of a block. These blocks are independently wired and have on-die logic for testing (simple memory check). The blocks which pass testing are configured by blowing fuses so they're connected to the normal input/output logic, and the broken ones have fuses blown to disconnect them from the power rails.

They slightly overbudget the number of blocks when designing. This way if a few blocks are faulty they still have a working RAM chip. If a whole bunch of blocks aren't working, they can still sell it as a half- or quarter-capacity chip. A 8, 16 and 32 gigabit RAM chip in the same process from the same manufacturer may well have the same die inside, with different fuses blown.

This isn't new; it's been done since at least the 1980s. It's my understanding the same thing is done with cache and cores and even execution units on modern processors, but I would assume with a great deal more complexity.

henrikeh · on July 22, 2021

The book “VLSI test principles and architectures: deign for testability” edited by Laung-Terng Wang et al, is quite good.

wmf · on July 21, 2021

Cache has extra capacity so bad parts can be mapped out during the testing process. If there's a fault in a core the entire core is disabled. A fault in the uncore will probably cause the entire chip to be scrapped.

phendrenad2 · on July 21, 2021

Transistors are actually very reliable, especially on a silicon wafer where they're protected from the elements. Unless there's a voltage spike somehow, they're unlikely to go bad.

mardiyah · on July 21, 2021

That's why called IC, integrated circuit

either not one at all or entirely outright break down

HarryHirsch · on July 21, 2021

The keyword is "mercurial core", isn't it?