I still don't have a clear understanding. Apple also made big gains by integrating performance sensitive stuff on the package. AMD could do the same. I wonder if they are constrained by some hardware-level interoperability requirements across the motherboard, like with chipsets or DMA controllers or whatnot made by different companies. Apple clearly has a nice advantage here.
My understanding from reading around (I can't find sources right now) is that a big chunk of Apple's performance comes from how wide their instruction decode hardware, and long their instruction pipeline is.
ARM instructions are fixed length so it's very easy to stack up decoders next to each other. You don't need complex logic to figure out where instruction boundaries are, which you do with x86.
This added to a super long instruction pipeline means that the M1 has the ability to reorder a significant (something silly like 2x longer than any AMD or Intel CPU) number of instructions to increase CPU utilisation and thus performance.
AMD can't do this on an x86 CPU because the complexity involved is too high, and think someone has found comments from them basically saying so.
Yeah Apple did make gains from a better integrated package, some of that would be tricky to pull off else where without additional software support. Some of the other integrations would be tricky to pull of because it would reduce the amount of customisation OEM could do. Remember one of the big things Apple integrated was RAM, and making it unified memory for all the co-processors. That could be tricky to do, unless we move into a world where RAM is always integrated into CPUs.
> Apple also made big gains by integrating performance sensitive stuff on the package. AMD could do the same.
The only thing the M1 has integrated that a typical AMD or Intel laptop CPU doesn't have is a neural net processor. Everything else about the M1's 'integrated architecture' is typical.
Notably the M1's RAM is not integrated as is often incorrectly stated. It's just regular soldered LPDDR4X. And M1's RAM latencies are worse than Intel & AMD's socketed DDR4 latencies, so being soldered and physically close is definitely not providing a performance advantage.
I think the performance sensitive stuff Apple integrated is to some extent dependent on their control of their own software stack, which AMD doesn't have control of.
Yeah ARM isn’t x86. Apple made big gains by leveraging the difference in instruction length and complexity between x86 and ARM.
Something AMD can’t do, and they’ve said as much.