I think 2019-2020 will be super interesting for hardware. Intel, Samsung, Gloflo, TSMC could all have competitive 7/10nm nodes. Unless either Intel or AMD makes some crazy IPC gains they should be fairly competitive with each other and it will be interesting to see what the ARM giants can do as well. Hopefully Chinese investment into DRAM/NAND starts to come to fruition by then too.
We're to a point where N nm nodes is meaningless. I want a list of feature dimensions to compare. That's the only meaningful way to do a rough comparison these days.
You can think of power consumption as a resource -- if you reduce power consumption of existing features you can put more new features. The actual size isn't really that important (other than the cost of silicon wafer).
To significantly reduce power consumption you need improved process and this is where smaller nodes are so important, because they are currently the only really viable way to significantly reduce power consumption.
You can find them if you look. E.g. tsmc 7nm is roughly equivalent to Intel 10nm in most dimensions. Samsung's 7nm is almost exactly intel 10nm in two dimensions. Etc etc.
Processes in the "same level" are not automatically competitive against each other. Cost, yield and performance can vary.
I think the cost and complexity of advancing lithography processes has increased so much that the technology risks have increased dramatically. Some foundries may blunder relative to others.
Intel has abandoned their tick–tock model for Prosess-Archictecture-Optimization. In retrospect it seems clear that they knew that 10nm process is risky step and it's impossible to predict when it's ready.
That was when they believed they'd stay 3 generations on the same process node.
Now it's more like Process-Archictecture-Optimization-Increased Clock Speed And Power Consumption-We Don't Know What We're Doing Anymore
Also, you're right about the complexity. Intel's 10nm process is likely far more complex (more steps) than Samsung's 7nm EUV process, or even TSMC and GloFlo's 7nm DUV processes, which is why it's taking them so long.
In retrospect, Intel becoming a "manufacturer for ARM chip companies" seems laughable now, doesn't it?
“Because of the production difficulties with 10nm, Intel has revised its density target back to 2.4X [from 2.7X] for the transition to the 7nm node.”
Ouch. A day late and a dollar short.
They claim that this new node will still be better than TSMC’s new node, but we are now in leapfrog mode aren’t we? Where you have a year advantage on your competitor and then they will have the best tech but you’ll be halfway to unseating them again?
And I bet they just mean TSMC's 7nm node there, but that's very misleading. By the time Intel has its 7nm node ready, TSMC will be ahead with its 5nm node.
I understand, and that's how they should be comparing them. I was saying Intel likely compared its 7nm with TSMC's 7nm, where of course Intel would "win". But by the time it "wins" that, they won't compete with TSMC's 7nm in the market anymore, but with its 5nm.
It's just a strange way to look at it. Intel's naming convention is more honest and closer to traditional measurements and of course each marketing dept. will choose to say they are the best in the world across all processes, regardless of whatever the competition calls their feature size.
I'm no expert on this but I do know that their 7nm will switch to their long in-development EUV lithography process. I'd assume the work on 10nm is completely separate from their work on 7nm (there are even rumors that they'll just abandon 10nm if 7nm is ready before they work out the problems with 10nm).
Considering the troubles they have with 10nm and the fact that everyone else will have more expertise with EUV lithography than them (Intel hasn't even planned to use EUV until 5nm), that seems quite doubtful at this point.
Can someone explaing why it's important to increase the density instead of increasing the size of a CPU?
Knowing nothing about chip design I'm probably thinking about this the wrong way but socket backwards compatability aside is it not feasible to simply increase the chip size? Is a higher density more rewarding?
Power use increases quadratically with voltage. You want small transistors to keep the voltage and power use from getting out of hand. You also need to increase voltage if you want to increase clock speed.
Electric signal travels in a conductor roughly 15 cm/nsec. With 3 GHz clock speed the electric signal travels travels roughly 50 mm in one clock cycle. Largest microchips are 30 mm across. You can't double the dimensions without dealing with the signal lag. Delivering the clock signal to every part of the chip in sync is already a problem. Modern microchips use lots of extra circuitry just to deliver the clock signal properly.
> You also need to increase voltage if you want to increase clock speed.
Or use lower VT cells (they turn "on" quicker), at the expense of increased leakage power. But at these geometries, faster clock speeds is getting less feasible and you need to find increased performance in other ways.
No logic signal needs to cross the entire die in one clock cycle, there is always an alternate design. For that reason, only registers 'talking' to each other need to see a clock at the same time, and even then there is a window. Clock routing is a consideration that takes resources but it's not a problem. Realistically, a logical signal won't be going anywhere remotely near 50mm at 3GHz in 10nm, so the clock doesn't need to either.
Max die size is also limited by the vendor's tooling i.e. what their machines can literally handle. And also physical issues such as warpage. If you make a massive die and it heats up in a non-uniform manner (different bits of it get hot at different times), it expands in a non-uniform manner. This can lead to all kinds of problems.
Chips stuffed full of memory will yield better than a logic-heavy chip since large SRAMs always now include redundancy. So this too has an impact on how big you can go for a given cost. You can however get registers that are built of multiple storage elements, the output value of which is the consensus. Don't know how much these get used.
In reality its the signal speed relative to jitter, time interval errors and data setup times that complicate the design and signal integrity.
If the clock is 3GHz, the margin of error is the fraction of the time of the clock rate. You need to divide the chip into clock regions and add local cache for each core because fetching data far away is too slow.
Whilst it's true that modern chips have problems with routing clocks, that's not really a limiting factor in chip size. You split the design into clock regions and have clock crossing logic. There's obvious ways this happens - multi-core designs have different clocks for different cores for example. That's not really the limiting factor for chip size.
The cost of clock regions is slower speeds between tiles and more transistors and power wasted in clock management. In the end you need need buffers and cache for cache for cache to maintain speed and locality. All this wastes transistors.
Do you have a good source on how clocks work in modern designs? From the high level down to the actual circuits?
I've always wondered why you can't generate clocks locally, but in a synchronized way. So basically like clock regions, but without having to add extra logic for data that goes between regions.
You can do this. Its called a "bus". CPUs communicate over a bus (ex: AMD's SerDes, which creates the Infinity Fabric. Or Intel's "Mesh Network") with each other on the die itself.
CPU Cores use a singular clock. But when cores communicate or the L3 cache communicates (cache coherency is needed if you want that mutex / spinlock to actually work), then you need some kind of communication mechanism between the CPU Cores. Those clocks are likely "locally generated", but there needs to be a translation mechanism between Bus -> Core.
It's possible I misunderstand you, but I don't think we're talking about the same thing.
With buses, you have different clocks and data moves between them. Like you said: CPU core 1 has its own clock, the bus between them has its own and different clock, and then CPU core 2 has its own clock which is yet again different. And in those cases you actually want different clocks, because you want to be able to boost CPUs independently from each other.
What I meant goes in another direction: instead of having a single powerful clock source for e.g. a CPU core, you have multiple smaller clock sources distributed throughout the core, but synchronized to each other so they run at the same frequency and phase. So data can move freely like it does today, but clock signals don't have to be distributed as far, which would hopefully make clock distribution easier and less power hungry.
It seems like such a thing should be possible, but perhaps there are good reasons why it isn't done?
1. Clocks don't use a lot of power. Think of a pendulum: there's a lot of movement but the energy constantly swings between gravitational potential energy and kinetic energy. Although there's lots of movement, the device uses very little energy. Similarly, a clock circuit (called an oscillator) barely uses any electricity: it mostly "Swings" energy back and forth between an inverter and a capacitor.
2. Distributing a clock over a long distance similarly uses very little power (!!) due to transmission line theory. You can effectively use the parasitic capacitance in wires themselves to effectively do this pendulum effect for efficient long-distance transmission of clocks. See: https://en.wikipedia.org/wiki/Transmission_line
I guess things could be de-sync'd for more efficiency. But your question is kind of like "Well, can't we get rid of V-Tables in C++ to make branch-prediction more efficient??"
I mean, we can. But V-Tables / Polymorphism really doesn't take a lot of time. We only do that if the performance gain really matters.
Interesting, thanks. I'll see if I can grok this from the link you gave.
I do have one follow-up question though: I was under the impression that clock trees contain repeaters in the form of CMOS inverters. Wouldn't those have dynamic leakage which the transmission line stuff doesn't account for?
I'm not really an expert at the VLSI level, I'm simply thinking from a PCB-perspective (and I just know that some of the same issues occur in the smaller chip-level design).
From my understanding: yes, the CMOS inverters will certainly use power. But you can minimize the use of them through some passive techniques.
Looking into the issue more, it does seem like a naive implementation of synchronized clocks can become costly. But at the same time, I'm seeing a number of research papers suggesting that people have been applying transmission-line techniques to the clock distribution problem.
I've always assumed that it was something that was commonly done at the chip level, but apparently not. These papers were published ~2010 or so.
Some signals are sent in such a way that you can recover the clock signal from the state transitions. Then use that clock signal to drive the rest of your circuit.
I highly recommend Code by Charles Petzold. Personally, it helped me to have a more intuitive understanding of clock cycles and the underlying architecture of computers. https://www.goodreads.com/book/show/44882.Code
One reason is cost. Large size means there’re fewer chips produced per wafer, which means every single chip will cost more, proportionally to the area of a single chip.
Another reason is yield. Defects are inevitable. Their probability per unit of area is roughly constant, i.e. it doesn’t depend on the area of a single chip. Therefore, the probability of one or more defects on a single chip is proportional to the exponent of the chip area. With larger chips, that exponent grows very quickly.
The tactics works well for GPUs. But for CPUs, you only can do that sometimes, not always.
Cores only make about half of the area. If the error is not in a core but in e.g. RAM controller or IO controller, you have to throw away the complete chip.
I asked someone who worked on chips a similar question. The density helps connecting the parts. If the nodes aren't near each other, they have to move through other nodes to get from one to the other, and that process is "slow" in the world of chips.
I guess it's like trying to get from LA to SF. You could get to SF of you 10x'd the size of earth, but it would still more than 10x as long despite having the same connections because you'd need to stop for some extra connections to even make it.
> Can someone explaing why it's important to increase the density instead of increasing the size of a CPU?
* Yields: silicon wafers have regular manufacturing errors. A bigger die means more failed CPUs, grossly increasing prices. Smaller dies isolate those errors better, leading to better yields. Lets say there are around 20-errors per wafer. 100-chips per wafer would result in ~80is to 85ish successful chips per batch.
If you shrunk the die so that you had 500-chips per wafer, then you have 480-chips after manufacturing (20-defects).
Wafers are a constant size. Errors are relatively constant as well. You can't change those numbers.
* Power: Smaller feature sizes use less power. Smaller capacitance, so the signals travel faster and generally speaking the design can be clocked higher.
Aside from the other reasons provided, there's also something called the reticle limit. The manufacturing process will have a maximum die size, due to the finite size of the optics used to expose the photoresist.
This right here is the big one, to be honest. Yes, clocks, latency between parts of the die, etc are all problems - but they can be worked around with some effort.
Thermals and also power delivery are huge problems with large chips, just compare the massive 471 mm2 die of the GP102 (1080 Ti/Titan X) to the 150 mm2 die of the Coffee Lake hexacore chips. GP102 can draw 250-300W depending on boost clock, the Core i7-8700K can also draw upwards of 200W depending on how high you push the clocks and vCore (to keep said clocks stable).
There's a reason why board-partner GPU's always have huge coolers attached to them, and why people pushing CPU clocks are often using at least a giant air cooler like the Hyper 212 EVO or an AIO liquid cooler with a 240mm+ radiator.
Hell, let's skip thermals and just talk electricity - getting 200W+ of stable power to the cores on these dies is no easy task as-is, that's why you have people like buildzoid doing reviews of power delivery on motherboards and GPU boards to see if VRM's are going to blow up trying to power your expensive hardware if you're overclocking (or sometimes even if you aren't).
All in all, we have thermal and power scaling issues at current chip sizes - making them bigger isn't particularly feasible unless everybody is going to start installing 360mm radiators in their system and even that might not be enough depending on clock speeds and the vCore required to maintain them.
I do believe that the savings from getting more dies out of a wafer are non-negligible (wafers themselves are somewhat cheap, but the equipment they're wearing out with each pass - not so much), but my guess is that they aren't the main driver behind the process shrinking.
TBH, I think we just need much more efficient software. This is something we can do, and used to do more. It is insane how inefficient many things we write today are compared to how they could be. I'd love it if efficient code was a reasonable goal again, but at the moment, it just isn't worth the time using more efficient technologies and doing optimisation because it usually isn't really necessary. I've spent time optimizing code where it was really necessary, but it so rarely is that I've only once had the opportunity to spend as much time as I'd like to on this.
Amen to this. In '89, a Sun Sparcstation 1 ran a decent implementation of UNIX with a GUI on a 20 MHz processor with just 8 MB of RAM. Now we have dual core processors running at 700+ MHz and 0.75GB of RAM used to power smartwatches that do vastly less. I mean, sure, it's technologically impressive and all but one has to wonder what the devil are they doing in there that requires all that power?
For the longest time it's been a truism that developer time is a lot more expensive than CPU time, which leads to things like the proliferation of Electron.
I get why it's done, but part of me wishes that CPU growth would stall off for a bit so the software industry would have no choice but to treat optimization as a selling point again.
Do we though? Think about how many engineers the big 5 are hiring... they can't get enough! If we really needed more efficient software, they'd have to hire even more!
The focus should be on developer productivity first, then product iteration, then performance.
Yes, but once Moore’s law really stops, the need for product iteration will too, unless you can get functionality gains from increased performance. Once you start caring about performance, you’ll rapidly realize the “productivity” languages slow you down.
For example, try to compute CRC32 checksums of 100 byte arrays at bus speed in C/C++ and then in the higher level language of your choice.
Can I use rust as my higher level language of choice? :)
In all seriousness though, I wonder if once Moore's law really stops we'll see a surge of innovation in compiler and language paradigms, born out of necessity. It seems to me that the revolution in higher-level languages and paradigms came because we could (computationally) afford it, so it makes sense that once the incentives change we'll see innovation in other areas.
I would think other HN readers are more at the center of things than me and better equipped to comment, but my perception is, comparing the last ten years to the ten years before that, we have seen Moore's law stop and we have seen a surge of innovation in software, and even in chip design when you consider things like TPUs. It seems odd to present it as speculation about the future.
I agree. And barring some groundbreaking invention that completely alters the hardware landscape, I think the trend will be heterogeneous/domain specific ICs. Which means new software development paradigms as well.
These are really hard to sell unless the domain has already taken off widely. This has only really happened to graphics accelerators and to a lesser extent crypto.
If we are talking about just-in-time, managed vs ahead-of-time, compiled-to-native, parts of the industry have already been changing course in last 5 years - even if not always with full commitment, but more of a "just in case".
Java and .NET platforms has been gaining ahead-of-time compilation support from the main vendors. Microsoft has been reviving C++ support. Various new programming languages with static-typing and high performance as a priority are gaining traction.
On Microsoft's case it is even more radical, UWP is basically the initial design for .NET, based on COM as ABI instead of a managed one, hence .NET Native and C++ as main languages.
Recently I've been working with some legacy Win32 C/C++ code and had the thought, why don't I take all of these hard-coded #defines and Win32 message numbers, window-class names, etc. and put these magic numbers in an Sqlite database instead?
Of course I'd be loading a lot of data from disk into RAM that would have simply been encoded as immediate values in the executable before. But my rough estimate still puts this at (at least) an order of magnitude less RAM than abstraction layers like WPF use.
And that's when I realized: we jumped from severely resource constrained designed straight to completely wasteful designs without adequately exploring the design space in between. To paraphrase Richard Feynman's quote, "There's plenty of room at the bottom": "There's plenty of room in the middle."
From what I understand LPDDR4 as part of Cannon Lake is directly affected by this delay.
Do we have confirmation of this? It seems startling to release yet another generation of laptop CPUs without this... but I admit I can’t find anything to to the contrary.
Pretty much anyone building mid-to-high-end laptops must be livid.
Coffe Lake does not support LPDDR4. Cannon Lake do support LPDDR4 but is now delayed. As a matter of fact I am wondering if they might skip Cannonlake and go Icelake instead.
There is no other rumoured "lake" in between, so yes, another generation without 32GB RAM Laptop.
These information are widely available everywhere, not sure what you want as confirmation.
The linked article mentions “Whiskey Lake” as an intermediate 14nm range (actually it says “desktop”, but I’ve seen it mentioned in a laptop context elsewhere). But I fear you’re right that LPDDR4 support is still being left for Coffee Lake...
> Pretty much anyone building mid-to-high-end laptops must be livid.
Well those which put a premium on sleep/suspend autonomy, which for DDR4 is the big edge of LP. IIRC the "active" energy consumption of DDR4 was lowered to LP-level already.
> Do other manufacturers, offering 32gb, use desktop RAM?
They use regular DDR4, talking about "desktop" RAM may not be the best for comprehension, DDR4 is available as SODIMM module and lots of work went into making DDR4 significantly less power-hungry than DDR3.
> What kind of compromises would that bring to a MBP?
IIRC it would burn battery ~30% faster when sleeping (e.g. when you close the lid, unless you have changed the configuration to be strictly "suspend to disk", which requires going through pmset and the command-line).
A good time to bring up: https://newsroom.intel.com/editorials/moores-law-setting-the...
>Second, in today’s world Moore’s Law can be delivered only by a few companies. Every new process node gets harder and therefore more expensive.
[...]
>So, no, Moore’s Law is not ending at any time we can see ahead of us.
while Moore's original paper: http://www.monolithic3d.com/uploads/6/0/5/5/6055488/gordon_m...
was always about the trend of putting _more_ components in a chip being more cost effective than keeping the number of components constant. Surely if this were still the case we'd have seen Skylake++++ with cores << n by now.
Does this create a situation where Apple will be forced to switch away in order to differentiate, or does Intel plan to offer other kinds of improvements to their larger customers? Sounds like greater vertical integration will become a key differentiator. Admittedly, if the gaps between ARM and x86 chrome books are any indication, there’s still some room for Intel to be competitive. But does that hold for someone targeting a higher price point and with more control end-to-end?
Are we talking about the lithographics process itself as a problem or other manufacturing problems like VIAs, etching, diffusion? The article is a liitle bit vague on the actual problems Intel is experiencing. But I doubt Intel will us, anyway ;)
So only NUCs will get Cannon Lake after all? With rumors that it contains a discrete Radeon 500 GPU it might be the first cheap Steam Box for everyone...
Ha, "cheap". Have you looked at the AMD+Intel hybrid CPU prices? I don't think either AMD or Intel is particularly interested in that being a price/perf leader. Especially from the AMD side who already sells zen CPUs with AMD GPUs inside.
I think a missile designer did this one time. The missile wasn't expected to exist very long after it was turned on, so they cranked up the power on the CPU and didn't include much of a heatsink. Saved weight but testing was very expensive.
"He went on to point out that they had calculated the
amount of memory the application would leak in the total possible flight time
for the missile and then doubled that number. [...] Since the missile
will explode when it hits it's target or at the end of it's flight, the
ultimate in garbage collection is performed without programmer intervention."
Intel's already tried CPU DLC with the Pentium G6951 back in 2010, you could buy a $50 upgrade card that would unlock hyper threading and extra L1 cache. Plus there's now the VROC hardware keys to unlock NVMe RAID support on X299 CPU's.
Microtransactions were the next step for the gaming industry after traditional expansion packs and then DLC, let's just hope we don't get CPU loot boxes (though I guess we already have that simply due to the silicon lottery, eh?).