Hacker News new | past | comments | ask | show | jobs | submit login
10GHz at under 1V by 2005 - The future of Intel’s manufacturing processes [2000] (anandtech.com)
278 points by cft on Feb 8, 2017 | hide | past | favorite | 164 comments



Article says: Obviously this 8 – 10GHz clock range would be based on Intel’s 0.07-micron process that is forecasted to debut in 2005. These processors will run at less than 1 volt, 0.85v being the current estimate.

Intel introduced a 65 nm (0.065 micron) process in 2006. The "Cedar Mill" Pentium 4 processor ran at 3.6 GHz at a whopping 1.3V although a small double-pumped part of the processor ran at 7.2 GHz. It could be overclocked to 4.5/9.0 GHz at 1.4V.

The discrepancy between 0.85V and 1.3V was caused by the end of Dennard scaling. Basically transistors require much more voltage than predicted and thus consume far more power than predicted. Although the transistors can technically run at 9 GHz, the resulting power density is very difficult to cool.


Yes, more about the end of Dennard scaling: https://en.wikipedia.org/wiki/Dennard_scaling#Breakdown_of_D...


It appears that the double-pumped adds aren't used any more. I thought they would have been good to keep, Adds being so common.


They are not used because with transistors that keep getting smaller but that don't get much faster, it's better to just have more add units in parallel than it is to run the ones you have faster. A modern Intel CPU effectively has 4 add units to the 2 of the P4.


> Although the transistors can technically run at 9 GHz, the resulting power density is very difficult to cool.

But nowadays we have processors with multiple cores, where sometimes you need only 1 core (and it needs to be fast). So would it be an idea to increase clock frequency for those cores, but multiplex them quickly to allow them to cool?


While the clock speed has largely stagnated, the actual work done per cycle, even on just one core, has gone up significantly. Consider: A double-precision fused multiply-add consumes only 4 cycles of result latency today. The number of memory operations (and other instructions) in flight at any given moment has gone up dramatically, the number of execution units has gone up a little bit (so that the maximum instruction-level parallelism is higher), and so on.

Its not the rapid growth of the 90's and early 2000's, but it is still growth.


I would argue that we pretty much reached peak single-core x86_64 scalar instruction stream concurrency ~6 years ago with Sandy Bridge. SIMD has gotten wider since then (AVX2, etc) and there are occasional new instructions for certain workloads (including FMA as you noted), but general purpose scalar (non-SIMD) workloads have not gotten much of an IPC boost. Actually, to the extent that those workloads have gotten faster, it's been mostly from the return of clock frequency scaling - from low 3GHz to low 4GHz on the desktop SKUs.


Nope. It is not just thermals but also memory latency. If you have four cores and each has two register files you can get 8x the bandwidth at the same latency.

That 10GHz talk was a lie on the part of Intel to intimidate people away from AMD, not only would a 10GHZ P4 melt down, but it would be stalled all the time from memory latency. So many things did not work that it was not an honest mistake.

Today there is talk of a big clock rate bump (to 200 GHZ or so) if they go to a different semiconductor, but at that point you probably need a fiber optic or terahertz wave link to memory to keep the pipeline full.


>Today there is talk of a big clock rate bump (to 200 GHZ or so) if they go to a different semiconductor, but at that point you probably need a fiber optic or terahertz wave link to memory to keep the pipeline full.

You talk as if there couldn't possibly be a benefit to an increase in speed without a corresponding increase in memory bandwidth. Whilst it wouldn't be an optimally efficient system, if we /could/ bump to 9GHz (or 200GHz), wouldn't it be worth doing so for at least some kinds of calculations, even if the memory can't keep up?

edit: Both responses were super-interesting. Don't wanna reply to both, but thanks all :)


There's a word for this: computational intensity, i.e. the ratio of useful compute operations per memory load in an app.

Are there are apps that have high computational intensity? Sure, matrix multiply is one of them. That's one of the reasons why dense linear algebra serves as the standard benchmark to determine the top 500 supercomputers in the world.

But even in HPC (high performance computing), many if not most apps actually have relatively low computational intensity (i.e. in the range of one or so compute operations per word of memory loaded). In this regime, it really doesn't make sense to grow compute out of proportion with memory bandwidth because you'll just be idling the processors.

And while I have no proof, I'd expect HPC applications to generally be more computationally intense than general consumer computing tasks. So I'd expect that computational intensity goes mostly down from here.


Maybe you could do cryptography more quickly, but for general-purpose computing, and even most specialized tasks, memory latency and bandwidth are critical. For instance, look at the use of GDDR5 and HDM together with GPUs.

Most of the market is for things that are generalizable; maybe you could make some kind of hyper-DSP for millimeter wave base stations or something like that, but you have to spread out the development cost across a low number of units.


Any source on the clock rate jump? That's more than I had heard.


That is a long term target: definitely you can get transistors to switch that fast. 200GHz would not be a short-term target, but would it happen in 20 years? maybe.


Other people have covered Intel's TurboBoost feature, but this reminds me of something GM did with the Cadillac Northstar V8 back in the early 90's.

https://en.wikipedia.org/wiki/Northstar_engine_series

When the engine did NOT have coolant, it would run only one bank of cylinders at a time and alternate between the two to let them cool.

The system was at least somewhat effective. There's a story from the time about a journalist that was testing the feature in the desert. He stopped at a truck stop after conducting the test and amazed the folks at the stop by opening the hood, filling the engine with coolant, and driving off like nothing was amiss.


That's just a bit different to Intel Turbo Boost which has been widely deployed for a while. When fewer cores are needed it will increase clock speed on a small number of cores if there is work to do.

Some BIOSes have settings to completely disable a bunch of cores to enable more turbo boosting.

This isn't as sophisticated in thermal as what you outlined, but it saves on cache coherence.


The general term for that is Dark Silicon [1]. It helps a little and as, others have pointed out, Intel CPUs already have a similar feature called "Turbo Boost." NVIDIA processors also have a similar "GPU Boost" feature. But I don't know if that can enable a single core to run at 10GHz. Shutting off other cores does not lower the local power density / thermal dissipation of the single core at 10GHz. You still have to work hard to cool that single core, in order to prevent the silicon from being damaged.

[1] https://en.wikipedia.org/wiki/Dark_silicon


Right, but the suggestion is that when "one core" is running, you switch around which core it is that's actually hot, so they're taking turns generating the heat.

Obviously there's some cost there in sharing registers, cache, etc, but it's an interesting notion.


I see, that is an interesting notion. But how do you "share registers" at 10 GHz, without massive stalls during context switches? The suggestion seems require that the cores are separated by some considerable physical distance, in order to allow for heat dissipation. However this distance also means that data sharing between the cores would be slow and also very power intensive in itself. I'm not a chip designer though, just wildly guessing.


I don't think you'd need to. I expect you'd be able to run the core very fast for a number of milliseconds before having to switch.

If performed at the hardware level, you may also get some improvement by having the core push its register and low level cache state directly to the next processor, rather than have them pull the data through shared cache or RAM. Unlike a typical context switch, the process is not being resumed from idle, but has an active cache state.


The Windows Kernel already does this automatically, at a period of around 10-100ms. If you look at a single threaded program it will actually utilise all cores not one, by default its not locked affinity and the thread skips around.


Interesting, do you have a source for this? I'm curious to know more about the rationale for doing such a thing.


The Linux kernel does the same thing. Run htop and run a single threaded workload. You'll see it migrate around between all the cores.


There's a limited form of this already - on many multicore processors if you're running a single-threaded workload then "turbo boost" or similar will allow the one running core to clock higher than normal. Not aware of the multiplexing idea being implemented, but I'm not sure how much value that would give - I suspect the overall amount of heat to be dissipated is the limiting factor more than where exactly it is in the package.


https://en.wikipedia.org/wiki/Intel_Dynamic_Acceleration

introduced in ~2007 in Core 2 mobile CPUs (Penryns I think). You can load Middleton modded bios into your oldshool Thinkpad R61/T61 and it will enable dual IDA[1] in top CPUs.

http://forum.notebookreview.com/threads/t61-x61-sata-ii-1-5-...

You can even load modded(by chinese) modded middleton bios, solder one wire and plop $8 T9550 CPU for 2.66GHz boosting to 2.8GHz. You could even use quads, but its beyond uneconomical with cpus being more than whole used thinkpad T420.

http://forum.thinkpads.com/viewtopic.php?f=29&t=110620

This was Intels first steps and not a lot of OEMs enabled IDA. Turbo Boost was next, introduced in Nahalems.

https://en.wikipedia.org/wiki/Intel_Turbo_Boost

[1] dual IDA is a trick forcing IDA on both cores. Throttlestop utility is also able to turn it on.


The fun thing is that they were reducing the power with each stepping by that time.


Ah yes, the 10Ghz CPU. Now if you run 6 cores at 2.5Ghz each[1] is the 15Ghz? By the end of 2001 it was clear that Intel really needed to rethink things. And ever since that time we've had a series of "linear projections" which have come up short (process nodes, power levels, cpu power). Not to say we don't have some pretty awesome toys to play with that can do a lot more than machines of a decade ago could do, but every time someone takes an exponential and extends it out into the future to make a point, I stop and ask "And what if this is the end of this s-curve?" Nature hates an exponential almost as much as she hates vacuums :-)

[1] https://www.amazon.com/Intel-Xeon-2620-Processor-BX80621E526...


There's an argument that when the current s-curve (sigmoid) starts to reach its limit we start looking for other ways of obtaining improvements, such that over time we have a series of sigmoids and overall exponential progress... until the final sigmoid.

It reminds me of Malthus thinking the economy/world was doomed because we were about to run out of trees, and then we started making significant use of coal (coal had been used for a long time, but only on a small scale).


It's interesting that while we aren't at 10 GHz quite yet, the functionality that was mentioned in the article [1][2] is here and works even on budget smartphones.

--

[1] Imagine being able to speak normally with your computer as you would a secretary sitting next to you and have your computer accurately and quickly take notes from your speech.

[2] Imagine logging onto your computer not via a user name and a password but by sitting in front of your display and having it scan your face to figure out if you are allowed access to the computer.


Is this really the experience of voice recognition people have? I can't get Siri to reliably create appointments... takes about 3 tries for the voice recognition to work correctly.


Siri's voice recognition definitely isn't state of the art, although it got a lot better for me with iOS 10. However even the Google app on iOS has leaps & bounds better recognition in my experience.


The "OK, Google" voice interface on Android, whatever it's brand name is, is amazing. Do try it out. It's able to understand me even with a lot of noise (street sounds).

Also, there might be something wrong with the mic if you're using an external bluetooth mic?


While it is amazing in its comprehension, it feels incredibly slow. Even just saying "OK, Google" takes a bit too long for it to respond, while saying "OK, vijucat" would probably elicit an immediate response.

This is just my experience however on my Nexus 5 with very good wifi. Hopefully others are having a better experience, or there is a fix for the lag, because despite these problems, I use OK Google (Google Now?) quite often.


Definitely. Google Home picks up everything perfectly on the level that I'm really surprised the few times it actually fails.


With Amazo nEcho this was exactly my experience for the first week. Since then, I've learned where it fails and are becoming more annoyed by it. The clear difference between current voice recognition and a real person is that they don't adapt to the way you talk. They don't learn if they misunderstood you once.

As an example, if I ask Alexa to play a certain kind of music and she plays the wrong one, I'll have to specify it further to get the correct music. Next time I'd expect her to get the clue, but she'll make the same mistake again.

Even though the most annoying thing is that chain commands don't really work with any assistant. I'd expect to give 4-5 commands at once and get them executed. Activating it for each command is very annoying.


I have an iPhone now and don't bother with Siri because it's wrong more often than right.

Between Apple, Google, and Microsoft. To me only Cortana is worth using. For example. If I say "text prentice" neither Siri or Google Now can figure it out. But 100% of the time Cortana picks it up.

I do miss my windows phone :(


On Android, it has worked well for me for simple things (e.g. "Remind me to take my laundry off the hanger tomorrow at 15:00"). Have not tried anything fancy, though.

A coworker once told his smartphone, "Tell me a dirty joke". It kind of worked, it opened a web search for "dirty joke". Cortana really impressed me by actually telling a joke; it was about dirty laundry (but I suspect that was intentional to avoid offending people). ;-)


I use it obsessively. Currency conversions, fact checking, basic maths, "what's the tip for $20", sending texts, sending WhatsApp messages, making phone calls, "flip a coin", "wake me up at 7", setting timers, "Add to calendar"....

It works wonderfully well. I use it obsessively to the point that people now joke that if you ask me a question I'll simply ask Google about it.


How did you find out what you could tell it? After the first ten useless Google searches that popped up, I was looking for a vocab list. And then I gave up.


Honestly, I think i've built an instinct on what works and what doesn't. Simple statements do work, as do some simple questions. It's a bit of trial and error but if it's a simple fact check, it's most likely you can phrase the question in a manner that gets you an answer.

E.g. I randomly tried: What do I do when I burn my finger? and it answered. Same for: How do I get ink stains out of my shirt?


Siri is pathetic, really. Try Google Now, or the new Assistant on Google Pixel.


Example 1, if we're referring to the likes of Siri and Alexa isn't thanks to improvements in personal computer technology - said platforms send your speech recording off to a massive datacenter for processing, no need for 10Ghz processors there.

Example 2 requires the use of depth-sensing and heat-sensitive cameras to avoid trivial "show a photo of an authorized user" attacks - that's not really CPU-dependent either.


Not sure if Siri and Alexa (and Google Now) send every recording to a data center - the .NET speech recognition libraries ship with Windows, so all audio data stays local to your PC, afaik. I'd expect Cortana leverages these libraries as well, instead of sending all data to a remote server.

You can build basic functionality into a speech bot in Powershell ("PowerShiri, what time is it?" "PowerShiri, what is the weather?") as a weekend project: https://news.ycombinator.com/item?id=11663029


At least with Google Now, you can go into your Google Account and listen to recordings of everything you've ever asked via voice search:

https://myactivity.google.com/item?product=29

(If the Product 29 parameter doesn't work, click on "Item View" in the left and filter by "Voice & Audio".)


Cortana sends it off to a server like everyone else. Those libraries are probably not Microsoft's latest and greatest.

Modern speech models are quite big, not so big that you couldn't load it on your desktop, but big enough that you would notice. Couple that wih the fact that search/or a service is going to happen on a server anyways, client side processing doesn't make sense beyond a few functions.


Alexa can only recognize the keywords. Which is also why you can only choose between 3, because the device isn't able to detect anything else. After that, the command will be analyzed in Amazon's datacenter.

I don't know of any smartphone or similar device that would interpret voice commands locally.


Android has had face unlock built in for some time. A naive way it mitigates the "unlock with photo" vulnerability is an option requiring you to blink. Not as robust as 3d, heat sensitive cameras but it's at least not a trivial as showing a photo to beat it.

The impressive amount of processing power available on many smartphones today certainly contributes to this being a practical unlock method.


You're right, you also need to sweep across the eyes with a pencil to defeat that. In case you're interested in more detail, Google starbug's excellent talk "ich sehe, also bin ich... du" (there's an English translation).


Is face unlock fooled by video of a person blinking?


Siri sends data to the cloud for processing what you're asking for, but the actual speech-to-text can work locally these days. Try it - out your iPhone in airplane mode and fire up dictation - works like a charm.


Can I ask what you would dictate that doesn't need an internet connection? A note or a shopping list come to mind, but that's it for me.


Actually I hate the fact that google sends my voice over the network , not because of privacy but because I have to pay for the additional Data sent.


How much do you pay for data, and how much do you dictate, for this to be an actual gripe?

I assume Google encodes the data in something like iSAC [1], requiring 32 kbit/s for good quality speech, so an hour of dictating is 3600 x 32/(8 x 1024) = 14 MB.

[1] https://en.m.wikipedia.org/wiki/Internet_Speech_Audio_Codec


Dictation in iOS9 and later can be used without an internet connection on newer iPhones.


RE example 1, there is offline voice recognition, though.


The functionality existed on desktop computers at the time, so it's not really a prediction.


I certainly don't remember anything of usable quality being available to end-users. My Windows 98 machine could barely play 360p videos and didn't even have a camera. The voice recognition that I do remember trying was of very low accuracy.

Do you remember any specifics of where & how exactly these features were available?


State of the art for dictation was probably Dragon Dictate / Naturally Speaking.

https://en.wikipedia.org/wiki/Dragon_NaturallySpeaking

Windows had an API for basic command recognition, in late 90's giving the ability to navigate through telephony menus, for example:

https://en.wikipedia.org/wiki/Microsoft_Speech_API#SAPI_1-4_...



It was completely usable at the time, but did require a relatively beefy PC. Doctors and attorneys very commonly used it to dictate letters.


And for context "beefy PC" here means less CPU power and RAM than the cheapest smart phone today.


That's a lot better than the results I get from Google voice today. Writing code isn't what it's optimised for though.


We had people in the office in the late 90s with repetitive stress issues who were using Dragon NaturallySpeaking pretty effectively.


But where have the repetitive stress issues gone?


Ageism, if your wrists are toast by age 40, but no one gets hired over 30, or whatever the exact numbers are, it doesn't matter much as long as A is higher than B.


Dragon Dictate worked pretty well in 2001 from what I recall, if you were willing to train it.


I remember around 2007/2008 Sony VAIOs had facial recognition login. I freaked out on how decent it was, then thought about the DMV and Facebook being able to use this and really freaked out.


Not sure about the software but webcams were available from the early 90's. I think there was an old amiga magazine that came with some facial recognition demo ware.

For voice there were things like dragon that could do reasonably accurate full voice dictation. IME it was more accurate the google voice is today because it was trained.

Moving these to the cloud is more about controlling data than it is about lack of local computing power.


Also interesting is that modern speech and image recognition runs on NVIDIA GPUs using neural networks, with the Intel CPU relegated to playing a supporting actor role. GPUs also face the same Dennard scaling problem, and don't run anywhere close to 10GHz, but their microarchitecture makes better use of the available silicon/power for parallel numerical computing problems.


> have your computer accurately and quickly take notes from your speech

Well, about those accurately and _your_ computer parts… :)


My first computer ran at a touch over 1MHz, and I've had a lifetime of needing (sic) to upgrade my computer every couple of years to chase or improve upon the same perceived performance levels.

It's refreshing that -- despite not having 10GHz CPU's available -- my 6yo CPU/mobo still does not feel in any way slow or in need of upgrade.

Yes, other components have offset the relatively slow pace of CPU improvements (GPU's for gamers, SSD's for everyone, etc) but it feels like we're enjoying a lengthy era of 'good enough grunt' on the CPU front (for most of us).


>6yo CPU/mobo still does not feel in any way slow or in need of upgrade

The limiting factors have changed- or rather, the limiting computers have changed.

Throwing more client horsepower or internet speed at the problem can't solve the fact that the server (and to a limited extent, an internet connection) has to deliver both a huge amount of JavaScript and the (typically image/video/audio-heavy) content itself.

The client is always underutilized- the fact that smartphone apps (or rather, cached local copies of what would normally be a website) have similar performance to a modern desktop PC should speak volumes about that. It's probably why Microsoft smartphone-ized Windows; though their execution of that was awful and it didn't help that WinRT wasn't mature on release.

And raw server performance (like desktop performance) has been at a plateau for a while now as well. The lack of competition for Intel doesn't help that (maybe AMD's new processor line will start driving improvements again, but there are no hard numbers on performance; and higher-TDP ARM designs aren't currently competitive with Intel's in this space either).

Until this changes, and there's nothing to indicate it will on Intel's roadmaps, clients will continue to be good enough- it might be the first time in history where computers (since 2008 or so) are replaced because the hardware failed and not because they were insufficiently fast.


"it might be the first time in history where computers (since 2008 or so) are replaced because the hardware failed and not because they were insufficiently fast."

You're forgetting two things:

1 - Games

2 - VR

Games especially have a very long history of having a bottomless appetite for more and more powerful hardware every year as they push the limits of what's possible.

VR is dramatically upping hardware requirements, and those requirements are going to exponentially increase as consumers start to demand and expect 8k per eye, full motion, high framerate, 360 degree, 3D, wide field of view, interactive VR -- all on smaller and lighter headsets, ideally wirelessly.


>Games especially have a very long history of having a bottomless appetite for more and more powerful hardware every year as they push the limits of what's possible.

Sure, though the CPU has less to do with that. For reference, most modern games still perform just fine on first-gen i7s and second-gen i5s; the most recent of those was released 6 years ago. GPUs are a replaceable part where CPUs are not.

At least GPU technology is still advancing in big ways, though I think that's more a property of how they're built, what their functions are, and (especially for higher-end cards) what kind of power budget users are willing to accept. It's unusual for the newest mid-range card not to match the previous high-end card.

Intel, on the other hand, has never released a CPU with TDP over 150W (AMD had a couple at 220W), even though most in the overclocking community know that 5GHz is regularly attainable on modern CPUs. But that that's been mostly true since 2013.


You make some good points -- and to be candid, I didn't touch on the network effect(s) because I was trying to maintain some brevity / focus in my original post.

Aside -- I grew up in the days before network reliance (or even network presence) consequently I don't think of my 32GB / 8 core / 8TB / 32" monitored computer as an enhanced VT52. :) Though I certainly respect the fact that for many people, once you exceed the grunt required to render HTTP(S) at an acceptable speed, there's no great interest in a faster computer.

Your observations on Intel -- is it possible they were a bit more prescient than we typically give them credit for, insofar as not pursuing ever faster CPU's (which would now have been considered relatively unnecessary for the majority of installations)?


I remember reading this news (and perhaps even this exact article - it certainly feels familiar) back when it was released and being so excited as a 15 year old gaming nerd. It really was a time when stuff was changing so rapidly - and those changes were resulting in massive, really obvious performance gains - that you couldn't help but just look forward to where it might go if the pace continued. Everything was about the big frequency numbers and it was easy to buy into the hype.

It'd be interesting to know the sort of IPC delta between a Netburst architecture and Kaby Lake. Also, it'd be interesting to know how theoretically fast you'd have to push Netburst to see the same performance as Kaby Lake.


This is from 2011, but it should give you an idea: http://www.tomshardware.com/reviews/processor-architecture-b... .

tl;dr: Core i7-2600k @ 3GHz, is about 2,67x faster than a Pentium 4 HT 660 (Prescott ua) @ 3Ghz. In the benchmark they tested single core, and equal clock, to keep everything as level as posible.

As for single core performance, Kaby Lake would be between 30 to 40% better than Sandy Bridge (which could be called the peak improvement architecture in the Core i brand), so it'd be about ~3,5x faster than the Prescott based chips.


I'm surprised Intel made this bold claim so late. When I was in college in the late 90's, my computer architecture prof said the physics problems get really hard starting around 4 to 8 Ghz. He said it as it was common knowledge in the industry.


Damn. The linked article says nothing about Intel claiming 10ghz, and the author is just speculating that, applying a 9x speed increase to Netburst (the same 9x they saw with the P6), 10ghz should be doable. But then the link at the bottom, "FACTS FROM INTEL" specifically mentions 10ghz, presumably from the horse's mouth.


Marketing people write statements like this. They don't ask for conservative estimates.


Just to give a little intuition for exactly how fast your CPU runs (assuming ~3 GHz), a single cycle takes about as much time as it takes a photon to travel from your monitor to your eyeball.


299,792,458 meters/second ÷ 3,000,000,000/second = 9.99 cm

Just how close are you sitting to your monitor?


Perhaps I need glasses. :)


That's helpful. How many football fields is that?


I'm partial to Olympic sized swimming pools myself.


Yards or Meters?


That's only true if your monitor is 4 inches away from your face.


The funny part is I'm reading this article almost 2 decades later, on a CPU that's probably throttling itself down to the 1.4 ghz level... the top of the line then.


And yet, it's probably over a factor 10 faster and a factor 10 more efficient. (random numbers, I don't actually know how cpus have scaled or if they can accurately be compared)


Anecdotally, I've heard that Everquest 2 (released 2004) is still impossible to run at 60FPS with all graphical settings maxed; the devs attempted to future-proof it with speculative high-end graphics options, but they optimized for high-GHz processors which never arrived.


No single GPU could run Crysis (2007) at 1080p / 60fps until last year!


And it looked not that good. For the time it was fine, but it hasn't aged that well. Guess that is a good sign against future proofing your game?


I would argue no mmo ever made has ever truly run at 60fps but that's another story I guess :p


Over a decade later, even overclockers haven't managed to reach 10GHz --- but some have come close:

http://www.tomshardware.com/news/amd-fx-8150-overclock-9ghz-...

At 10GHz, light travels approximately 3cm between each period of the clock.

Note that transistors which operate above 10GHz are not rare and used in microwave applications; as I understand it, the difficulty is in creating logic circuits with them and at a scale suitable for a CPU.


The difficulty is thermal. Microwave stuff operates in linear region, extremely fast and often extremely inefficient. Logic stuff almost always operates in saturation mode where its very slow but efficient. Old timers will remember emitter coupled logic ECL which went to great efforts to operate in linear mode at amazing power use, like 100 MHz IBM mainframes needed water cooling back when 2 MHz Z80s were considered fast. You could make something like ECL circuits using modern ICs but the thermal load would be like the surface of the sun and its cheaper to break the problem down into 10000 parallel processes that to make some kind of ECL.


Funny you say that given I discovered ECL looking up ways to do fast CPU's on human-verifiable nodes. Might try to make some of them in the future.


> 10GHz, light travels approximately 3cm between each period of the clock

Light doesn't change speed, so this statement confuses me.


> Light doesn't change speed, so this statement confuses me.

Practically speaking, it does change speed. The speed of light in a vacuum is fixed, but it varies based on what it's traveling through.

The propagation velocity of an EM signal in RG coax cable is 80% of c*, PCB traces can be as low as 50% and somewhat surprisingly both fiber optic cable and cat6 cable are about the same, ~60-70%. I don't know if there is any good public information about the velocity factor with modern CPU transmission lines.


The frequency is what's changing. How far does light travel in 1/10,000,000,000 of a second?


10GHz is the clock frequency, not the light frequency ;)


I see three main failed predictions:

  - 10 GHz
  - long pipelines, I think we're currently at around 14 stages, which is more than P6 but less than Netburst (Pentium 4)
  - EUV lithography is still not a thing at 10 nm
Die I miss any important failed predictions?


Slightly off topic, this reminded me of the whole over-clocking scene in the 2000's, being excited over these things. :)

http://valid.canardpc.com/records.php

https://www.youtube.com/watch?v=UKN4VMOenNM (the world record at 8.429GHz)


> Realistically speaking, we should be able to see NetBurst based processors reach somewhere between 8 – 10GHz in the next five years before the architecture is replaced yet again

was a statement made by anandtech. Did Intel actually strive for those numbers?


Yes, Tejas was designed for >7 GHz, but it probably would have shipped at 4-5 GHz (while providing similar or lower performance than a 2.x GHz Conroe) due to poor transistor scaling.


Yes, click at the link at the bottom right of the article (Next) http://www.anandtech.com/show/680/7


Whoops thank you! I guess I have to look for the rest of the article next time.


Haven't we reached the point of diminishing returns regarding processor speed? I believe this to be true for the consumer market.


Why would that be the case? If we could run CPUs at 50Ghz with no problems why wouldn't we?

Edit: I probably exaggerate. I just mean that higher clock speed still have benefits.


I think this "conveyor example" from Intel is a good explanation as to why: https://software.intel.com/en-us/blogs/2014/02/19/why-has-cp...

I think that you would cease to see any acceleration from increased clock speed unless you also sped up other parts of the processor / computer.

I think your sentiment is correct. If we can take advantage of higher clock speeds, then why not do it?


Reminds me of something Seymour Cray (allegedly) once said: "Anyone can build a fast processor - the trick is to build a fast computer around it". Given the performance boost an SSD gives, I imagine there's some room left for improving overall system architecture without increasing the clock speed of the CPU or replacing the CPU at all.

(Caveat lector: Quoting from memory here)


Part of it is the balance of value between hardware and software companies. When hardware stagnates, software has to become increasingly complex, and software companies dominate.


If you only use your computer for web browsing, then sure.

But the consumer market is more than just this. Several popular "consumer" applications, such as gaming and photo/video editing, continue to see benefit from increases in CPU performance, and single-core performance (what most people actually want when they ask for higher clock speeds), continues to be very relevant to this day. Not all workloads are easily parallelizable.


Work grows to fill the available time. "IT" style tasks, like reading/writing documents, spreadsheets, emails, etc. don't require anything like the resources they currently use, or those they used in 2000. They use(d) those resources since there was no reason not to.

Lots of (most?) software development typically charges ahead without much regard to resource usage, until performance becomes a problem; then things are optimised until performance is no longer a problem, and the charge resumes. This results in software with performance which is just about acceptable, regardless of what resources are available. It was the case in 2000, it is the case now, and it would be the case if we had 50GHz machines.

This is the case for tasks where the main bottleneck is 'has anybody bothered to implement this yet?'; I'd say your examples of gaming and video editing are tasks where performance is a major part of the bottleneck. Arguably, Trello didn't exist in the 90s because nobody had bothered to make it yet; Skyrim didn't exist in the 90s because the machines weren't up to it.


For Desktop computing of normal people ? Maybe, but there are a lot of people and businesses that can always use more CPU power. High end CPUs will also eventually trickle down to lower powered devices and i still think modern Ultrabooks could be faster than they are.


Interesting. One day, maybe one day, Intel will release a desktop processor that is worth upgrading to from my 6 year old 2600K. One that doesn't cost $1723, that is.


Define worth?

They have released a process that is worth upgrading to: https://ark.intel.com/products/97129/Intel-Core-i7-7700K-Pro...

https://ark.intel.com/products/52214/Intel-Core-i7-2600K-Pro...

There are a massive host of improvements between those two processors. Each release since Sandy Bridge has continued to increase performance between 5 to 10% in most tasks and more in some specifics. Over 4 releases that is a noticeable effect.

That processor in the US is 350 Dollars.


So, what was the reason CPUs could not scale vertically (higher frequency)? Temperature? Stability?


Each switch of a gate causes energy to be expended. When a transistor switches, there is some current flowing and some voltage across the source and sink. That voltage * current is greater than zero, and therefore requires power. This is opposed to the stable non-switching state where there is no current and therefore no power.

So, because each switch consumes the same amount of power, the power consumed is directly proportional to the switching frequency. Double the frequency = double the power.

As other posters have alluded, Intel tried to work on the power issue by decreasing the voltage. Unfortunately, they were not able to figure out how to decrease the voltage as they had in the past. With the high power required to run high frequency circuits, cooling would have been more of a problem than could be overcome economically.


it's even worse - transistors require more voltage to reliably switch at higher frequencies, e.g. http://images.anandtech.com/reviews/cpu/intel/22nm/powersm.j...


And to get even more pedantic, it's even worse than that alone implies because switching power increases with the square of voltage.

We're also at the point where transistor leakage current matters. All transistors leak a tiny amount of current even without switching. Alone this might only be a few µA, but pack enough of them together on a die and that might add up to hundreds of mA. Combine that with higher voltages so the chip can run at high clock speeds, and the fact that leakage increases with temperature (which will be driven by higher voltages and frequencies), and all of a sudden you have a feedback loop which can cause you to burn a few extra Watts before you've even done anything.


If I understand correctly, Intel CPUs around that time were super deeply pipelined so you could meet timing closure at such high frequencies. So yeah you can keep pushing the clock-speeds up and up but now a mispredicted branch really shoots you in the foot. Makes sense at the time because people associated clock speed directly with performance. Higher frequency more better. Intel kinda gave up on that at some point and started over with the core architecture design which has a lower clock speed but does more each pipeline stage.


Those very deep pipelines are the reason NetBurst was very good at some tasks but not so great for general use. They made branching mispredictions be really expensive as a pipeline flush would result in a large number (~20 IIRC) cycles to be "wasted".


Haha, yeah P4 had multiple pipeline stages just for propagation delay.


There is also another fundamental limit, the speed of light.

Because of the special relativity theory nothing including electrical current can transmit information faster than the speed of light (c = 3*10^10 cm/s). So if maximum size dimension of your CPU is say 2cm (I'm ignoring that they actually can have multiple smaller cores), then the maximum physically achievable frequency is upper bounded by size/c = 15 GHz.

Although it has not been reached yet, it has the same order of magnitude as the record frequencies achieved by overclockers using liquid nitrogen cooling (something a little below 9 GHz). Also it shows that reducing transistor size and as a consequence overall CPU size increases maximum physically allowed frequency.


Thank you for your synaptic sugar this morning. It saved an incredibly boring meeting during a long track implementation project.


It's very easy to design a CPU such that signals do not have to cross it in one cycle. The speed of light is unlikely to ever cap the speed of a CMOS processor.


Mostly heat dissipation. Assuming you hold voltage constant, heat increases linearly with frequency. Also, as process nodes get smaller, you increase the # of transistors per square inch, so a jump to a new node can also increase heat generation quadratically if all else stays constant.

As other commenters mentioned, you can counteract this with better cooling, but eventually the marginal $/watt of cooling isn't worth the tradeoff of lower speeds & multicore processors.


> Temperature? Stability?

Yes. Notice how after it was clear that P4 was a dead end they went and dusted off their P3 mobile line that they had been making more power efficient the whole time?


Technically it was heat but mostly it was due to (i) Economics, there was less demand for faster clock speed. Otherwise more research could have gone towards solving heat problem. (ii) Each cycle of CPU was more efficient with ability to execute multiple instructions in a single cycle and with more efficient instruction sets.

Surprisingly, power consumption also made huge impact. As tablets and laptops got more popular than desktop battery life became a major concern and thus TDP played major role in research.

Try this fun experiment: Underclock your CPU by half a GHz and see if you notice the difference in your day to day work.


No, it was only because of power density i.e. too much heat dissipated in a really small area. There is no way to "solve" this issue, other than to just throw more cooling at it. And since more cooling = more money, Intel (and friends) went down the multicore route instead.

No amount of R&D spending can bend the laws of physics to overcome the inherent limitations of silicon. I'm sure Intel also looked into alternative semiconductors (e.g., III-V) before giving up on the 10 GHz dream.


Single-thread performance is as important as it has ever been.

That a secretary typing a document or someone who only spends time on facebook doesn't notice the difference is irrelevant- consider, for example, the massive capital outlay by the financial industry to have servers located as closely to the world's trading hubs as possible. If they are willing to pay whatever it takes to shave milliseconds off a round trip, faster CPUs are a part of that equation.


> faster CPUs are a part of that equation.

I think the GP did not debate that but pointed out the for CPU speed/throughput, clock speed is only part of it. Adding functional units and allowing the CPU to process more instructions in parallel can have a big impact, so can e.g. larger cache, better branch prediction and so forth.

If you give people faster CPUs, they will cheer and find something to keep them busy. ;-) And for some people, there is no such thing as "fast enough". But for a fairly large share of desktop/mobile users, the is not the limiting factor as much as memory bandwidth and I/O.


I don't disagree with that statement in a general sense. But what earns Intel its money and marketplace dominance? The cheap Celeron/Pentium-class chips sold in bargain laptops & Best Buy specials? Or the high-end, single-thread performance chips?


> Otherwise more research could have gone towards solving heat problem. (ii) Each cycle of CPU was more efficient with ability to execute multiple instructions in a single cycle and with more efficient instruction sets.

Dude, Intel spends something like $80B/yr on R&D. This is closer to hitting fundamental laws of physics barriers.

They killed off their P4 line and developed their mobile line for a reason.


the $80B a year in R&D is off by an order of magnitude.

https://www.fool.com/investing/2017/02/05/intel-corporation-...


Isn't an order of magnitude 10x? According to your link they spent $12.74 billion in 2016.


Not necessarily, it could be e if you're using natural logarithms. Anyway: log10(12.74) = 1.10, log10(80) = 1.90. So, it's a little less than a full order of magnitude, but pretty close.


Although not normally used for smaller amounts it can go in both directions. ~1/10th is still an accurate if archaic use of the term.


Indeed, that's more than Intel earned in total revenues in 2016...


I accidentaly underclocked my old CPU (Athlon 651K) to 800MHz and found out after about 2 weeks when I bought The Vanishing of Ethan Carter. Other than that it was fine, sometime little slow, but comfortable.


heat mainly and the shift toward mobile devices which implies good autonomy and power efficiency.


Not enough time for electrons to move. Photons - another story.


This comment makes no sense. Neither electrons nor photons are what move through wires to propagate electrical signals.


I'd definitely say electrons propagate through wires to carry signals. The electrons charge the gates on the fets which opens or closes the channel, which allows electrons to flow through another wire. I wouldn't say it's terribly explanatory tho


Well that is not correct @gaze.

Although your light turns on very quickly when you flip the switch, and you find it impossible to flip off the light and get in bed before the room goes dark, the actual drift velocity of electrons through copper wires is very slow. It is the change or "signal" which propagates along wires at essentially the speed of light. refer https://en.m.wikipedia.org/wiki/Drift_velocity


What do you mean? Electrons move near the fermi velocity.


The fact that drift velocity is nonzero means that electrons are indeed moving as @gaze says.


Electrons are always moving, at the fermi velocity.


All the theory behind microelectronics is all about about electrons, "holes" and how fast and where they move.


However, you need a certain amount of electrons/charge to flow to flip a gate, because semiconductors work by filling and emptying voids in crystals. So gate speed is proportional to current, which is proportional to voltage, which is proportional to the square root of power.


They were scaling vertically because the GHz game was a marketing strategy and not an engineering strategy. But when the pipeline got so deep that it didn't make sense (branch misses would incur huge penalties) and AMD stole the spotlight, marketing finally gave way for the engineers.


The current generation of Intel chips can be overclocked comfortably to 5ghz with a simple water cooling setup.


You only have to risk frying your computer and destroying your data if the water cooler springs a leak.


Glad to see the correct answer at the top here. The "fireball" section of the P4 did run at double the frequency as the rest of the processor. So internally that was considered the processor frequency.

One of my first contributions to the Linux kernel was a bugfix to the bogomips routine. It stored the result in a 32-bit variable and our 8+ GHz chilled test machines would cause that to wrap.

It was then determined that the processor would be marketed using the slow clock frequency. This was the right answer, but it didn't feel like it at the time.

That article predates the marketing change. It might have made them realized they needed that change.


We may have to move beyond Silicon to get 10Ghz.

For General Purpose IPC Single threaded performance, we basically had little to no improvement since Sandy Bridge apart from Clock Speed. SSD / IO Speed helped to fill the performance improvement gap in the past 4 - 5 years.

Now we are waiting for the next big improvement to come. If there are any.


Best so far is 8.7Ghz with AMD chip which is a bit old now. But Kaby Lake is 7.3Ghz. All using liquid nitrogen of course. http://valid.canardpc.com/records.php#js-freq_all


I've always wondered why the need for ghz, isn't mips what you're looking for. wouldn't a 50mhz 1000 core cpu do well with decent parallelism in a compiler?

how many cycles does the average method/function/procedure need anyway?


The standard response is to quote Amdahl's Law https://en.wikipedia.org/wiki/Amdahl's_law

If 90% of your runtime can be done in parallel, you still have to wait for that last 10%. You can hit this limit with 10 cores (1 processing the 10% that's serial, the other 9 processing the 90% in parallel). If you throw 1000 cores at the problem, you'll have 999 cores processing the 90% in parallel, each performing 0.09% of the workload, but you'll still have 1 core doing the 10% that's serial. Those 999 cores will be idle for 99% of the time, waiting for that last core to finish.

The counter-argument is Gustafson's Law https://en.wikipedia.org/wiki/Gustafson%27s_law

This says that people don't choose a particular task, then wait for a computer to do it. Instead, the choice of which task to perform depends on what the computer can manage. Hence the user of a 1000 core machine will choose to do different tasks than the user of a 10 core machine, or a 1 core machine.

Whilst Gustafson's Law is clear from experience (a PlayStation 4 isn't used to run Pacman really fast), Amdahl's Law is the one that's relevant for compilers: a "sufficiently smart compiler" can alter your code in all sorts of ways, but the resulting executable must still perform the same task (otherwise it's a bug!).

There might be an approach based on e.g. writing an abstract specification and deriving a program which is suitable for the given hardware, but that's a long way off (for non-trivial tasks, at least).


The main problem is that generic parallelism is hard, and doesn't give you a straight X multiple speed up[1].

1. https://en.wikipedia.org/wiki/Parallel_computing#Amdahl.27s_...


What ever happened to quantum computing, DNA computing, and optical computing?

Are consumers ever going to see a general purpose computer based on any of those technologies on their desktops?


> Quantum computing

Still gradually in development, presumably coming someday. It's very difficult to get quantum systems of any real complexity to not decohere before they can do useful computation.

> DNA computing

DNA-based storage apparently exists in the lab and might exist someday: http://arstechnica.com/information-technology/2016/04/micros.... Current costs are something like $40 million/GB, so that'll have to come down "a bit". I don't know of anyone doing computation with DNA, biology seems too slow for that.

> Optical computing

It's very difficult to make light interact with other light (which is sine qua non for computation) except inside a material with electrons, where you get significant losses and there's no real advantage over just plain old transistors. Not likely to become a useful technology IMO.


It is about having vision really. Appreciate their vision from back then, and embrace what they have achieved so far


Well, where can I get mine?


Well, this would be less then, wouldn't it?


Intel can't beat the heat.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: