Hacker News new | past | comments | ask | show | jobs | submit login
Intel's 10nm Is Broken, Delayed Until 2019 (tomshardware.com)
242 points by ry4n413 on April 27, 2018 | hide | past | favorite | 101 comments



I think 2019-2020 will be super interesting for hardware. Intel, Samsung, Gloflo, TSMC could all have competitive 7/10nm nodes. Unless either Intel or AMD makes some crazy IPC gains they should be fairly competitive with each other and it will be interesting to see what the ARM giants can do as well. Hopefully Chinese investment into DRAM/NAND starts to come to fruition by then too.


We're to a point where N nm nodes is meaningless. I want a list of feature dimensions to compare. That's the only meaningful way to do a rough comparison these days.


You can think of power consumption as a resource -- if you reduce power consumption of existing features you can put more new features. The actual size isn't really that important (other than the cost of silicon wafer).

To significantly reduce power consumption you need improved process and this is where smaller nodes are so important, because they are currently the only really viable way to significantly reduce power consumption.


Now I kind of want to see the data on power consumption/transistor count over time.


You can find them if you look. E.g. tsmc 7nm is roughly equivalent to Intel 10nm in most dimensions. Samsung's 7nm is almost exactly intel 10nm in two dimensions. Etc etc.


Processes in the "same level" are not automatically competitive against each other. Cost, yield and performance can vary.

I think the cost and complexity of advancing lithography processes has increased so much that the technology risks have increased dramatically. Some foundries may blunder relative to others.

Intel has abandoned their tick–tock model for Prosess-Archictecture-Optimization. In retrospect it seems clear that they knew that 10nm process is risky step and it's impossible to predict when it's ready.


That was when they believed they'd stay 3 generations on the same process node.

Now it's more like Process-Archictecture-Optimization-Increased Clock Speed And Power Consumption-We Don't Know What We're Doing Anymore

Also, you're right about the complexity. Intel's 10nm process is likely far more complex (more steps) than Samsung's 7nm EUV process, or even TSMC and GloFlo's 7nm DUV processes, which is why it's taking them so long.

In retrospect, Intel becoming a "manufacturer for ARM chip companies" seems laughable now, doesn't it?


“Because of the production difficulties with 10nm, Intel has revised its density target back to 2.4X [from 2.7X] for the transition to the 7nm node.”

Ouch. A day late and a dollar short.

They claim that this new node will still be better than TSMC’s new node, but we are now in leapfrog mode aren’t we? Where you have a year advantage on your competitor and then they will have the best tech but you’ll be halfway to unseating them again?


And I bet they just mean TSMC's 7nm node there, but that's very misleading. By the time Intel has its 7nm node ready, TSMC will be ahead with its 5nm node.


They are effectively just code names. Intel's 10nm ~ TSMC's 7nm


I understand, and that's how they should be comparing them. I was saying Intel likely compared its 7nm with TSMC's 7nm, where of course Intel would "win". But by the time it "wins" that, they won't compete with TSMC's 7nm in the market anymore, but with its 5nm.


It's just a strange way to look at it. Intel's naming convention is more honest and closer to traditional measurements and of course each marketing dept. will choose to say they are the best in the world across all processes, regardless of whatever the competition calls their feature size.


That’s possible, but intel’s 7nm is likely to be better than TSMC’s 5nm regardless


Can Intel work apace on their 7nm node when the previous one is behind schedule by two years, or does this derail those plans?


I'm no expert on this but I do know that their 7nm will switch to their long in-development EUV lithography process. I'd assume the work on 10nm is completely separate from their work on 7nm (there are even rumors that they'll just abandon 10nm if 7nm is ready before they work out the problems with 10nm).


Considering the troubles they have with 10nm and the fact that everyone else will have more expertise with EUV lithography than them (Intel hasn't even planned to use EUV until 5nm), that seems quite doubtful at this point.


Per the article:

> The company will switch to EUV at 7nm.


Can someone explaing why it's important to increase the density instead of increasing the size of a CPU?

Knowing nothing about chip design I'm probably thinking about this the wrong way but socket backwards compatability aside is it not feasible to simply increase the chip size? Is a higher density more rewarding?


Power, heat and the speed of electrical signal.

Power use increases quadratically with voltage. You want small transistors to keep the voltage and power use from getting out of hand. You also need to increase voltage if you want to increase clock speed.

Electric signal travels in a conductor roughly 15 cm/nsec. With 3 GHz clock speed the electric signal travels travels roughly 50 mm in one clock cycle. Largest microchips are 30 mm across. You can't double the dimensions without dealing with the signal lag. Delivering the clock signal to every part of the chip in sync is already a problem. Modern microchips use lots of extra circuitry just to deliver the clock signal properly.


> You also need to increase voltage if you want to increase clock speed.

Or use lower VT cells (they turn "on" quicker), at the expense of increased leakage power. But at these geometries, faster clock speeds is getting less feasible and you need to find increased performance in other ways.

No logic signal needs to cross the entire die in one clock cycle, there is always an alternate design. For that reason, only registers 'talking' to each other need to see a clock at the same time, and even then there is a window. Clock routing is a consideration that takes resources but it's not a problem. Realistically, a logical signal won't be going anywhere remotely near 50mm at 3GHz in 10nm, so the clock doesn't need to either.

Max die size is also limited by the vendor's tooling i.e. what their machines can literally handle. And also physical issues such as warpage. If you make a massive die and it heats up in a non-uniform manner (different bits of it get hot at different times), it expands in a non-uniform manner. This can lead to all kinds of problems.

Chips stuffed full of memory will yield better than a logic-heavy chip since large SRAMs always now include redundancy. So this too has an impact on how big you can go for a given cost. You can however get registers that are built of multiple storage elements, the output value of which is the consensus. Don't know how much these get used.


Numbers I mentioned are just physical maxims.

In reality its the signal speed relative to jitter, time interval errors and data setup times that complicate the design and signal integrity.

If the clock is 3GHz, the margin of error is the fraction of the time of the clock rate. You need to divide the chip into clock regions and add local cache for each core because fetching data far away is too slow.


Sure, but that's all independent of the size of the chip.


Whilst it's true that modern chips have problems with routing clocks, that's not really a limiting factor in chip size. You split the design into clock regions and have clock crossing logic. There's obvious ways this happens - multi-core designs have different clocks for different cores for example. That's not really the limiting factor for chip size.


The cost of clock regions is slower speeds between tiles and more transistors and power wasted in clock management. In the end you need need buffers and cache for cache for cache to maintain speed and locality. All this wastes transistors.


Do you have a good source on how clocks work in modern designs? From the high level down to the actual circuits?

I've always wondered why you can't generate clocks locally, but in a synchronized way. So basically like clock regions, but without having to add extra logic for data that goes between regions.


You can do this. Its called a "bus". CPUs communicate over a bus (ex: AMD's SerDes, which creates the Infinity Fabric. Or Intel's "Mesh Network") with each other on the die itself.

CPU Cores use a singular clock. But when cores communicate or the L3 cache communicates (cache coherency is needed if you want that mutex / spinlock to actually work), then you need some kind of communication mechanism between the CPU Cores. Those clocks are likely "locally generated", but there needs to be a translation mechanism between Bus -> Core.


It's possible I misunderstand you, but I don't think we're talking about the same thing.

With buses, you have different clocks and data moves between them. Like you said: CPU core 1 has its own clock, the bus between them has its own and different clock, and then CPU core 2 has its own clock which is yet again different. And in those cases you actually want different clocks, because you want to be able to boost CPUs independently from each other.

What I meant goes in another direction: instead of having a single powerful clock source for e.g. a CPU core, you have multiple smaller clock sources distributed throughout the core, but synchronized to each other so they run at the same frequency and phase. So data can move freely like it does today, but clock signals don't have to be distributed as far, which would hopefully make clock distribution easier and less power hungry.

It seems like such a thing should be possible, but perhaps there are good reasons why it isn't done?


Two things:

1. Clocks don't use a lot of power. Think of a pendulum: there's a lot of movement but the energy constantly swings between gravitational potential energy and kinetic energy. Although there's lots of movement, the device uses very little energy. Similarly, a clock circuit (called an oscillator) barely uses any electricity: it mostly "Swings" energy back and forth between an inverter and a capacitor.

2. Distributing a clock over a long distance similarly uses very little power (!!) due to transmission line theory. You can effectively use the parasitic capacitance in wires themselves to effectively do this pendulum effect for efficient long-distance transmission of clocks. See: https://en.wikipedia.org/wiki/Transmission_line

This gif shows an animation of the pendulum effect in a longer-transmission line: https://upload.wikimedia.org/wikipedia/commons/8/89/Transmis...

----------------

I guess things could be de-sync'd for more efficiency. But your question is kind of like "Well, can't we get rid of V-Tables in C++ to make branch-prediction more efficient??"

I mean, we can. But V-Tables / Polymorphism really doesn't take a lot of time. We only do that if the performance gain really matters.


Interesting, thanks. I'll see if I can grok this from the link you gave.

I do have one follow-up question though: I was under the impression that clock trees contain repeaters in the form of CMOS inverters. Wouldn't those have dynamic leakage which the transmission line stuff doesn't account for?


I'm not really an expert at the VLSI level, I'm simply thinking from a PCB-perspective (and I just know that some of the same issues occur in the smaller chip-level design).

From my understanding: yes, the CMOS inverters will certainly use power. But you can minimize the use of them through some passive techniques.

Looking into the issue more, it does seem like a naive implementation of synchronized clocks can become costly. But at the same time, I'm seeing a number of research papers suggesting that people have been applying transmission-line techniques to the clock distribution problem.

I've always assumed that it was something that was commonly done at the chip level, but apparently not. These papers were published ~2010 or so.


Some signals are sent in such a way that you can recover the clock signal from the state transitions. Then use that clock signal to drive the rest of your circuit.


Thank you for your answer. Very interesting. I was vaguely aware of how clock cycles work but the signal lag problem didn't even occur to me.


I highly recommend Code by Charles Petzold. Personally, it helped me to have a more intuitive understanding of clock cycles and the underlying architecture of computers. https://www.goodreads.com/book/show/44882.Code


the quality and preciseness of this answer is why i read hn every day.


One reason is cost. Large size means there’re fewer chips produced per wafer, which means every single chip will cost more, proportionally to the area of a single chip.

Another reason is yield. Defects are inevitable. Their probability per unit of area is roughly constant, i.e. it doesn’t depend on the area of a single chip. Therefore, the probability of one or more defects on a single chip is proportional to the exponent of the chip area. With larger chips, that exponent grows very quickly.


You can always separate the chip into smaller cores and disable the ones that don't work.


The tactics works well for GPUs. But for CPUs, you only can do that sometimes, not always.

Cores only make about half of the area. If the error is not in a core but in e.g. RAM controller or IO controller, you have to throw away the complete chip.


I asked someone who worked on chips a similar question. The density helps connecting the parts. If the nodes aren't near each other, they have to move through other nodes to get from one to the other, and that process is "slow" in the world of chips.

I guess it's like trying to get from LA to SF. You could get to SF of you 10x'd the size of earth, but it would still more than 10x as long despite having the same connections because you'd need to stop for some extra connections to even make it.


> Can someone explaing why it's important to increase the density instead of increasing the size of a CPU?

* Yields: silicon wafers have regular manufacturing errors. A bigger die means more failed CPUs, grossly increasing prices. Smaller dies isolate those errors better, leading to better yields. Lets say there are around 20-errors per wafer. 100-chips per wafer would result in ~80is to 85ish successful chips per batch.

If you shrunk the die so that you had 500-chips per wafer, then you have 480-chips after manufacturing (20-defects).

Wafers are a constant size. Errors are relatively constant as well. You can't change those numbers.

* Power: Smaller feature sizes use less power. Smaller capacitance, so the signals travel faster and generally speaking the design can be clocked higher.


Aside from the other reasons provided, there's also something called the reticle limit. The manufacturing process will have a maximum die size, due to the finite size of the optics used to expose the photoresist.


It is basically an issue with the thermal capacity of the chip, larger size means more heat.


This right here is the big one, to be honest. Yes, clocks, latency between parts of the die, etc are all problems - but they can be worked around with some effort.

Thermals and also power delivery are huge problems with large chips, just compare the massive 471 mm2 die of the GP102 (1080 Ti/Titan X) to the 150 mm2 die of the Coffee Lake hexacore chips. GP102 can draw 250-300W depending on boost clock, the Core i7-8700K can also draw upwards of 200W depending on how high you push the clocks and vCore (to keep said clocks stable).

There's a reason why board-partner GPU's always have huge coolers attached to them, and why people pushing CPU clocks are often using at least a giant air cooler like the Hyper 212 EVO or an AIO liquid cooler with a 240mm+ radiator.

Hell, let's skip thermals and just talk electricity - getting 200W+ of stable power to the cores on these dies is no easy task as-is, that's why you have people like buildzoid doing reviews of power delivery on motherboards and GPU boards to see if VRM's are going to blow up trying to power your expensive hardware if you're overclocking (or sometimes even if you aren't).

All in all, we have thermal and power scaling issues at current chip sizes - making them bigger isn't particularly feasible unless everybody is going to start installing 360mm radiators in their system and even that might not be enough depending on clock speeds and the vCore required to maintain them.


8700K pushes 130W at 5ghz all Core OC under full AVX load without offset even uner LN you won’t get 200W from a 8700K.


Cost too. Many costs scale with die area, so a smaller chip has a cost advantage over a big chip all things considered.


>Can someone explaing why it's important to increase the density instead of increasing the size of a CPU?

If you compare semiconductors to crops, this determines how much bucks you get from an acre of silicon wafer.


I do believe that the savings from getting more dies out of a wafer are non-negligible (wafers themselves are somewhat cheap, but the equipment they're wearing out with each pass - not so much), but my guess is that they aren't the main driver behind the process shrinking.


If necessity is the mother of invention, then we'll need a lot better software shortly.


TBH, I think we just need much more efficient software. This is something we can do, and used to do more. It is insane how inefficient many things we write today are compared to how they could be. I'd love it if efficient code was a reasonable goal again, but at the moment, it just isn't worth the time using more efficient technologies and doing optimisation because it usually isn't really necessary. I've spent time optimizing code where it was really necessary, but it so rarely is that I've only once had the opportunity to spend as much time as I'd like to on this.


Amen to this. In '89, a Sun Sparcstation 1 ran a decent implementation of UNIX with a GUI on a 20 MHz processor with just 8 MB of RAM. Now we have dual core processors running at 700+ MHz and 0.75GB of RAM used to power smartwatches that do vastly less. I mean, sure, it's technologically impressive and all but one has to wonder what the devil are they doing in there that requires all that power?


Using the wrong compilation models?

It is not even the case of using GC and such.

Our phones are way more powerful than any Xerox PARC or Lisp Machine ever was.


For the longest time it's been a truism that developer time is a lot more expensive than CPU time, which leads to things like the proliferation of Electron.

I get why it's done, but part of me wishes that CPU growth would stall off for a bit so the software industry would have no choice but to treat optimization as a selling point again.


Do we though? Think about how many engineers the big 5 are hiring... they can't get enough! If we really needed more efficient software, they'd have to hire even more!

The focus should be on developer productivity first, then product iteration, then performance.


Yes, but once Moore’s law really stops, the need for product iteration will too, unless you can get functionality gains from increased performance. Once you start caring about performance, you’ll rapidly realize the “productivity” languages slow you down.

For example, try to compute CRC32 checksums of 100 byte arrays at bus speed in C/C++ and then in the higher level language of your choice.


Can I use rust as my higher level language of choice? :)

In all seriousness though, I wonder if once Moore's law really stops we'll see a surge of innovation in compiler and language paradigms, born out of necessity. It seems to me that the revolution in higher-level languages and paradigms came because we could (computationally) afford it, so it makes sense that once the incentives change we'll see innovation in other areas.


I would think other HN readers are more at the center of things than me and better equipped to comment, but my perception is, comparing the last ten years to the ten years before that, we have seen Moore's law stop and we have seen a surge of innovation in software, and even in chip design when you consider things like TPUs. It seems odd to present it as speculation about the future.


High level languages made it easier to write complicated programs without shooting yourselves in the foot.

Big 5 want to hire like crazy is a myth.


I bet Pascal, Ada and a few others will achieve similar results.


I agree. And barring some groundbreaking invention that completely alters the hardware landscape, I think the trend will be heterogeneous/domain specific ICs. Which means new software development paradigms as well.


Coincidentally,I had my PhD defense yesterday. This trend is the primary motivation for my work.

http://beza1e1.tuxen.de/phd

The research project is http://invasic.de


Impressive! Is it normal for edu .de pages to be in English?


Congratulations!


These are really hard to sell unless the domain has already taken off widely. This has only really happened to graphics accelerators and to a lesser extent crypto.


I can’t help thinking we should have headed down this path a long time ago. It’s harder to go backwards.


If we are talking about just-in-time, managed vs ahead-of-time, compiled-to-native, parts of the industry have already been changing course in last 5 years - even if not always with full commitment, but more of a "just in case". Java and .NET platforms has been gaining ahead-of-time compilation support from the main vendors. Microsoft has been reviving C++ support. Various new programming languages with static-typing and high performance as a priority are gaining traction.


On Microsoft's case it is even more radical, UWP is basically the initial design for .NET, based on COM as ABI instead of a managed one, hence .NET Native and C++ as main languages.

I really like this change of course.


It's not so much the underlying technology but what we have built upon it.


Recently I've been working with some legacy Win32 C/C++ code and had the thought, why don't I take all of these hard-coded #defines and Win32 message numbers, window-class names, etc. and put these magic numbers in an Sqlite database instead?

Of course I'd be loading a lot of data from disk into RAM that would have simply been encoded as immediate values in the executable before. But my rough estimate still puts this at (at least) an order of magnitude less RAM than abstraction layers like WPF use.

And that's when I realized: we jumped from severely resource constrained designed straight to completely wasteful designs without adequately exploring the design space in between. To paraphrase Richard Feynman's quote, "There's plenty of room at the bottom": "There's plenty of room in the middle."


Works both ways, they are inventing some insane light sources to use for EUV Litho - https://www.youtube.com/watch?v=5yTARacBxHI


Bye bye JavaScript


Sure Javascript could be more efficient but it's not like everyone developing it is waiting for the 2nm Intel process to fix it...


Let's just hire Mike Pall to make a JavaScript engine.


Hello Forth.pl


And good riddance.


It seems like we're going to have to stick to 16GB of RAM on MacBook Pro for another 1.5 years or so, aren't we?

From what I understand LPDDR4 as part of Cannon Lake is directly affected by this delay.


From what I understand LPDDR4 as part of Cannon Lake is directly affected by this delay.

Do we have confirmation of this? It seems startling to release yet another generation of laptop CPUs without this... but I admit I can’t find anything to to the contrary.

Pretty much anyone building mid-to-high-end laptops must be livid.


Coffe Lake does not support LPDDR4. Cannon Lake do support LPDDR4 but is now delayed. As a matter of fact I am wondering if they might skip Cannonlake and go Icelake instead.

There is no other rumoured "lake" in between, so yes, another generation without 32GB RAM Laptop.

These information are widely available everywhere, not sure what you want as confirmation.


The linked article mentions “Whiskey Lake” as an intermediate 14nm range (actually it says “desktop”, but I’ve seen it mentioned in a laptop context elsewhere). But I fear you’re right that LPDDR4 support is still being left for Coffee Lake...


> Pretty much anyone building mid-to-high-end laptops must be livid.

Well those which put a premium on sleep/suspend autonomy, which for DDR4 is the big edge of LP. IIRC the "active" energy consumption of DDR4 was lowered to LP-level already.


> sleep/suspend autonomy

What does that mean?

Do other manufacturers, offering 32gb, use desktop RAM? What kind of compromises would that bring to a MBP?


> Do other manufacturers, offering 32gb, use desktop RAM?

They use regular DDR4, talking about "desktop" RAM may not be the best for comprehension, DDR4 is available as SODIMM module and lots of work went into making DDR4 significantly less power-hungry than DDR3.

> What kind of compromises would that bring to a MBP?

IIRC it would burn battery ~30% faster when sleeping (e.g. when you close the lid, unless you have changed the configuration to be strictly "suspend to disk", which requires going through pmset and the command-line).


Battery life, cooling, and possibly even physical dimensions. Which is why the big heavy gaming PCs with lots of RAM have laughable battery life.


I'd sacrifice all three of those to be able to use a Windows VM with reasonable performance.


If you're fine with it being large, having no battery life/using a lot of power, why are you buying a laptop at all?


So I can take it home every day.

I'm not _fine_ with it, but right now I need 32gb of ram more than I need those things.


A good time to bring up: https://newsroom.intel.com/editorials/moores-law-setting-the... >Second, in today’s world Moore’s Law can be delivered only by a few companies. Every new process node gets harder and therefore more expensive. [...] >So, no, Moore’s Law is not ending at any time we can see ahead of us.

while Moore's original paper: http://www.monolithic3d.com/uploads/6/0/5/5/6055488/gordon_m... was always about the trend of putting _more_ components in a chip being more cost effective than keeping the number of components constant. Surely if this were still the case we'd have seen Skylake++++ with cores << n by now.


Does this create a situation where Apple will be forced to switch away in order to differentiate, or does Intel plan to offer other kinds of improvements to their larger customers? Sounds like greater vertical integration will become a key differentiator. Admittedly, if the gaps between ARM and x86 chrome books are any indication, there’s still some room for Intel to be competitive. But does that hold for someone targeting a higher price point and with more control end-to-end?


Apple has already announced that they'll start using their own chips by 2020 - Intel's problems may be part of the reason why: https://9to5mac.com/2018/04/02/report-apple-to-begin-switch-...


Apple has announced no such thing.

I suggest you actually read the article you linked.


Are we talking about the lithographics process itself as a problem or other manufacturing problems like VIAs, etching, diffusion? The article is a liitle bit vague on the actual problems Intel is experiencing. But I doubt Intel will us, anyway ;)


So only NUCs will get Cannon Lake after all? With rumors that it contains a discrete Radeon 500 GPU it might be the first cheap Steam Box for everyone...


Ha, "cheap". Have you looked at the AMD+Intel hybrid CPU prices? I don't think either AMD or Intel is particularly interested in that being a price/perf leader. Especially from the AMD side who already sells zen CPUs with AMD GPUs inside.


Ok, so they are delaying it again without a reasonable explanation why except that "it is hard". But the real question is:

Does it meltdown ?


your meltdown question gave me an idea, what if they had extremely fast cpu that were designed to meltdown and you use them like a consumable.


I think a missile designer did this one time. The missile wasn't expected to exist very long after it was turned on, so they cranked up the power on the CPU and didn't include much of a heatsink. Saved weight but testing was very expensive.


It was memory, not CPU:

https://groups.google.com/forum/message/raw?msg=comp.lang.ad...

"He went on to point out that they had calculated the amount of memory the application would leak in the total possible flight time for the missile and then doubled that number. [...] Since the missile will explode when it hits it's target or at the end of it's flight, the ultimate in garbage collection is performed without programmer intervention."


If it prevents duds then it's worth the effort.


That sounds horrific. Stop talking before you give them any ideas.


it would be for emergency calculations, or for running a games that would otherwise cost $$$ for $.


Yeah, cause I want a subscription service / micro-payments for my cpu as well.


Intel's already tried CPU DLC with the Pentium G6951 back in 2010, you could buy a $50 upgrade card that would unlock hyper threading and extra L1 cache. Plus there's now the VROC hardware keys to unlock NVMe RAID support on X299 CPU's.

Microtransactions were the next step for the gaming industry after traditional expansion packs and then DLC, let's just hope we don't get CPU loot boxes (though I guess we already have that simply due to the silicon lottery, eh?).


Maybe they need more time to implement the new backdoors...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: