I have to say a big [citation needed] to the claim of ARM beating x86 high end chips on performance per watt, at least on general workloads.
I think it's common to extrapolate Atom vs ARM to Xeon vs ARM in HPC, without thinking through the implications. We may well get higher performance/watt for single threads under ARM - I'm not disputing that, especially for integer work.
However, Amahdl's law is going to raise its head. In the same machine, a higher number of lower performance threads is going to cause lock contention. You'll also have to split computations over more boxes, since the absolute performance of an Intel server will remain far higher (by 2014, we're talking 64 core/128 thread Haswell). Both of which are likely to be a massive tax on performance.
To fight this, performance per core is likely to see a substantial rise, both in clock frequencies, and as a result of single core complexity. However, this will directly work against the two things that makes ARM performance/watt so impressive currently.
Also, Intel and AMD both are built around making those 100 watt scale processors fast and well. They really stumbled entering the Atom market; both because of a weak design (the chipset drew more power than the CPU itself!), as well as a lack of commitment (using 2-4 year old process nodes).
I think we're likely to see a similar teething pains with companies trying to enter the server market for ARM. The instituational knowledge just won't be there. Make a cache architecture that effectively feeds 64 cores? Way different to improving power drain on a mobile CPU, for the seventh generation. I expect it will be at least a few generations before design teams are fully up to speed.
I'm not saying we won't see certain workloads that are better off under ARM; memcached and static http serving are both likely to do well, since they're effectively just shuffling bits around, aren't particularly CPU intensive, and are embarassingly parallel. But I believe they'll turn out to be the exception, not the rule.
Which is to say, there's nothing magic about ARM that will let them beat x86 at the high end. They'll have to fight for it, and against Intel and AMD on their own turf no less.
[I copied this from a post I made a few months ago after I realized I was typing out basically the same thing]
I think it's common to extrapolate Atom vs ARM to Xeon vs ARM in HPC, without thinking through the implications. We may well get higher performance/watt for single threads under ARM - I'm not disputing that, especially for integer work.
However, Amahdl's law is going to raise its head. In the same machine, a higher number of lower performance threads is going to cause lock contention. You'll also have to split computations over more boxes, since the absolute performance of an Intel server will remain far higher (by 2014, we're talking 64 core/128 thread Haswell). Both of which are likely to be a massive tax on performance.
To fight this, performance per core is likely to see a substantial rise, both in clock frequencies, and as a result of single core complexity. However, this will directly work against the two things that makes ARM performance/watt so impressive currently.
Also, Intel and AMD both are built around making those 100 watt scale processors fast and well. They really stumbled entering the Atom market; both because of a weak design (the chipset drew more power than the CPU itself!), as well as a lack of commitment (using 2-4 year old process nodes).
I think we're likely to see a similar teething pains with companies trying to enter the server market for ARM. The instituational knowledge just won't be there. Make a cache architecture that effectively feeds 64 cores? Way different to improving power drain on a mobile CPU, for the seventh generation. I expect it will be at least a few generations before design teams are fully up to speed.
I'm not saying we won't see certain workloads that are better off under ARM; memcached and static http serving are both likely to do well, since they're effectively just shuffling bits around, aren't particularly CPU intensive, and are embarassingly parallel. But I believe they'll turn out to be the exception, not the rule.
Which is to say, there's nothing magic about ARM that will let them beat x86 at the high end. They'll have to fight for it, and against Intel and AMD on their own turf no less.
[I copied this from a post I made a few months ago after I realized I was typing out basically the same thing]