A lot of it is down to the Pentium 4 turning out to be a dead end. Intel designed it to scale to 10ghz, with a massively long pipeline. That turned out not to be possible, so all the design sacrifices that were made (primarily a ridiculously long pipeline) turned out to be bad bets.
When Intel introduced Core 2 Duo, performance per clock in many cases doubled, on the same socket and process node. I'm unaware of a precedent for that, at least in recent history.
Then Intel a couple of years later rolled out Nehalem, with an integrated memory controller and hyperthreading, cementing their advantage in the server market. AMD has been playing catch up ever since.
If Intel's chips were half the performance today, AMD would be winning; though not by quite as much.
>Then Intel a couple of years later rolled out Nehalem, with an integrated memory controller and hyperthreading, cementing their advantage in the server market. AMD has been playing catch up ever since.
Before intel essentially copied HT from AMD with their QPI, (I believe that Nehalems were the first QPI xeons) AMD servers nearly always came in dramatically lower power than the FBDIMM using Xeon-based servers.
Also note, in the early days of hyperthreading, it was a great way to run your two active processes on one core, while your second was idle. My understanding is that even now, in the best case, it's not a particularly huge boost.
I mean,yeah; between the release of the QPI xeons and now, for most things, intel has had the superior chip. But before QPI? man, if you paid for your own power, AMD was dramatically superior for high-ram applications.
Hyperthreading is better now. They even had to add another instruction to fix some of the issues. Plus a lot of work on the scheduler. Not worth disabling now!
It's interesting to wonder if Intel went down the P4 path knowing AMD would follow them into oblivion. For years after Intel turned the ship around, AMD continued selling hot power hungry chips (now with 900 watts of power!).
Until very recently, AMD basically conceded the entire mobile market to Intel, which it turns out is also a profitable market to dominate.
The big one is they fell behind on process technology. AMD (now Global Foundries)'s fab operations are at this point a full process node behind what Intel is doing. So as they release their bleeding-edge 32nm parts, Intel has long since worked out the kinks and is selling 32nm CPUs at budget prices. And simultaneously Intel is selling better-performing 22nm parts into the high end at much better margins.
During the Athlon/Opteron days, the silicon capabilities were much closer (does everyone remember the "paper launches" of various PIII speed bins, and of course the recall when they pushed a little too far?).
At the time ATI was 1/2 AMD's market cap. Too much of their capital got tied up in the acquisition and R&D fell off.
The overall vision is still very, very compelling (integrated FP co-processing). At the time, AMD figured it could beat Intel to the punch by buying one of the best FP companies out there. Instead they both(AMD/ATI)fell behind as merging the two companies took far more time, attention and capital than was originally anticipated.
I'm not convinced. The AMD product line fell into a ravine very soon after the acquisition -- CPU design has a half a decade of lead time, so their problems predated ATi.
I think AMD success is simply better viewed as Intel stumbling. The P4/Itanium split product lines were designs defined more by marketing and ideal market segmentation than they were by engineering, and they both also had some really bad design decisions. This caused Intel to fall from their technology leadership position for ~5 years, during which AMD produced steady, if slow, improvement, eventually overtaking Intel. After Intel got it's act back together, AMD hasn't been able to catch a break.
At this point, it just comes down to Intel spending more in R&D (http://newsroom.intel.com/community/intel_newsroom/blog/2011...) than AMD makes in revenue (http://www.anandtech.com/show/5465/amd-q411-fy-2011-earnings...) means Intel is at this point just throwing money at rapid release fabrication tech combined with their tick tock being so effective (they had to be after getting wasted with the Pentium 4 stagnation). Even if AMD didn't blow tons of market cap on buying up ATI, they would still be looking at Intel just being an order of magnitude larger than them outright. They struck gold with the Athlon 64 line and that is the only reason they ever overtook Intel, which like other commentors mentioned, was because of Intels own hubris.
2) Memory controller/north bridge into CPU - correct
3) lower price point than intel
They had a better product, at a lower price == market share.
Mobile:
They missed (as did intel) the ultra-low power/mobile market
1) ARM owns the market here
Bulldozer:
Made their bet on a high throughput/high latency micro-architecture, which didn`t pay off. So their now stuck with un-competitive technology at a lower price point which to the market == cheap knock off.
To make it worse, their process technology is a generation behind intel which increases costs thus eats further at any price discount.
GPU:
Meanwhile, Intel is eating their "Fusion" style cpu/gpu integration with every release they make. Can`t remember the last time some buzz about AMD fusion but intel HD 2k/3k/4k is in the news all the time.
They should have made Pat Gelsinger CEO when he was turned down for the same job at intel.
Looks like too little too late. If Steamroller is still a year or more out, that will put it in competition with Haswell (DDR4 in the server part, AVX2 with FMA ops, etc.). Considering how much Ivy Bridge is already dominating and those kinds of memory bandwidth and FP throughput increases, Haswell looks like it will be a monster.
I don't follow AMD's stuff much beyond knowing that Bulldozer was not very competitive beyond the $100-120 price point. Is Steamroller set to change that or is AMD dying a long, slow death?
Also, is AMD going to do anything in mobile or is that all ARM moving forward?
ARM is introducing 64 bit chipsets in less than a year or so. The power savings will be worth the switch for cloud hosts. Most devices sold already run ARM.
I have to say a big [citation needed] to the claim of ARM beating x86 high end chips on performance per watt, at least on general workloads.
I think it's common to extrapolate Atom vs ARM to Xeon vs ARM in HPC, without thinking through the implications. We may well get higher performance/watt for single threads under ARM - I'm not disputing that, especially for integer work.
However, Amahdl's law is going to raise its head. In the same machine, a higher number of lower performance threads is going to cause lock contention. You'll also have to split computations over more boxes, since the absolute performance of an Intel server will remain far higher (by 2014, we're talking 64 core/128 thread Haswell). Both of which are likely to be a massive tax on performance.
To fight this, performance per core is likely to see a substantial rise, both in clock frequencies, and as a result of single core complexity. However, this will directly work against the two things that makes ARM performance/watt so impressive currently.
Also, Intel and AMD both are built around making those 100 watt scale processors fast and well. They really stumbled entering the Atom market; both because of a weak design (the chipset drew more power than the CPU itself!), as well as a lack of commitment (using 2-4 year old process nodes).
I think we're likely to see a similar teething pains with companies trying to enter the server market for ARM. The instituational knowledge just won't be there. Make a cache architecture that effectively feeds 64 cores? Way different to improving power drain on a mobile CPU, for the seventh generation. I expect it will be at least a few generations before design teams are fully up to speed.
I'm not saying we won't see certain workloads that are better off under ARM; memcached and static http serving are both likely to do well, since they're effectively just shuffling bits around, aren't particularly CPU intensive, and are embarassingly parallel. But I believe they'll turn out to be the exception, not the rule.
Which is to say, there's nothing magic about ARM that will let them beat x86 at the high end. They'll have to fight for it, and against Intel and AMD on their own turf no less.
[I copied this from a post I made a few months ago after I realized I was typing out basically the same thing]
Why do you assume ARM-based processors will necessarily have a higher perf/W than x86? This has been a common claim (usually because of the perceived size of the x86 instruction decoder), but Medfield has proven that to be false:
Benchmarks and lies, yada yada. Atom wins on some things (in particular it tends to kick the A9's butt on Javascript benchmarks, which rely on single threaded dispatch and high clock speeds) and loses on others (it's a single core with hyperthreading, where most ARM SoCs are dual core).
Actually depending on the benchmark, the low-clocked Ivy Bridge CPUs tend to do quite well in "performance per watt" vs. ARM SoCs too. They lose big in idle power, but under load those enormous L3 caches and the uOp cache can give them 2-3x the performance per clock of the in-order A9 (and they run about 2x as fast, and draw about 4-10x as much power at peak, so it actually comes out very (!) roughly even).
ARM has a long way to go before they are legitimately competitive in the server space. But Intel still isn't anything more than "broadly competetive with 2-year-old devices" in the mobile world. Over time I'd expect the architectures to converge from both directions, but I don't feel lucky enough to guess at which one will "win".
oh, I'm not claiming that x86 is going to win in mobile or anything like that. My only point is that based on devices shipping today, the choice of ISA does not cause huge power efficiency disparities and that there's no reason to think that will change going forward.
It is very unclear. Performance per watt on server workloads in currently shipping ARM hardware is very poor, and IO performance is terrible. Waiting to get hold of some of the new server designed systems (Calxeda etc) once they actually ship. Most ARM hardware is two process generations behind Intel. I don't think 64 bit helps that much. A large memory system will consume much of its power in the RAM, so the difference between ARM and Intel per Gig is smaller, especially as no one is likely to stick 1TB in an ARM system so you are likely to get more CPUs anyway.
It is possible it will work out, but theory and reality are so far off now that I am keeping my options open, was going to start a business in this space but very much keeping the options open at this stage.
When you say "mobile", what segment do you refer to?
AMD has a pair of processors, one named "bobcat" and one on the way that was just announced named "jaguar", which are supposedly aimed at light laptops and tablets. I haven't heard of any AMD products aimed at cell phones, though.
Intel, on the other hand, does appear to be looking to get into the smartphone market, so you may see some Atoms in future smartphones trading paint with ARM.
The article mentions "feeding the cores" several times; possibly Bulldozer's cores were sitting idle too much, and this is how AMD plans on improving the performance.
I'm not aware of any AMD ARM products. I don't know if they're working in that space. I've heard that Intel is making ARM chips, even though they still have the x86 crown.
I'm not aware of any ATI mobile GPUs either, while Tegras have been prominent in recent products.
You're right SeaMicro did Atom and Xeon. Calxeda isn't owned by AMD though. I'd swear AMD bought an ARM based company, and I thought it was SeaMicro. Must have been their Fusion announcement stuff that I remember.
Intel was making ARM chips a decade ago. They'll never go back to making ARM chips again. If x86 falls, so does Intel. Intel is all-in with x86 now. That's why they are trying so very hard (yet unsuccessfully) to push Atom in the mobile market.
It doesn't matter much if they can keep making incremental improvements at this point. They really need to increase their iteration speed. This update should have been out last year.
1: http://en.wikipedia.org/wiki/Athlon#Athlon_.22Classic.22
2: http://semiaccurate.com/2011/10/17/why-did-bulldozer-underwh...