The key element in the CPU market space is volumen. Volume lowers the cost of manufacturing and allows you to spend much more money on R&D as it is amortised over more devices. While all the big RISC manufacturers had in principle better architectures than Intel, in the 90ies Intel would kill them one by one due to the insane volumes of the PC market. Only on the server PowerPC and Sparc would survive. This is what forced the PowerPC to Intel transition, Apple had litte choices. I always keep wondering what the outcome would have been, if one of the large RISC platforms would have been made available in more consumer level products, e.g. offering an ATX motherboard for running Linux. Volumes would have been much larger.
Another big factor for Intel was, that, financed by their huge cash flow, they had the most advanced fabs, so competitors often were 1-2 generations behind in the available processes.
But now a few things have changed. First of all, Intel got stuck with their 10nm process, so they are no longer the manufacturing lead. But most importantly, TSMC would pull ahead of Intel and offer their services to everyone in the market. For the first time, AMD had a manufacturing advantage vs. Intel.
And the iPhone happened. Given Apple an almost endless supply of money and a huge volume. Over many years, Apple built up a leading chip-design team. This already paid off big with the iPhone, having by far the most powerful CPUs in the mobile space. This also gave Apple a clear insight into the advantages of really owning the whole platform. Designing software and the cpus together.
Offering desktop-class CPUs is of course a large additional investment - so it is not a trivial step. But if Apple is willing to do it, it should be very interesting and would give hope that they have ambitious plans with the Mac, as it only makes sense if they really push the platform.
The big problem with switching CPUs is the instruction set.
For most people, it doesn't matter. But if you are in some niche domains, it really has an impact. I don't expect a smooth transition of libraries such as BLAS or VMs such as the JVM. You can't simply recompile these. You typically need a human to rewrite SSE, AVX and other tricky low level code so that performance stays competitive.
> You typically need a human to rewrite SSE, AVX [...] so that performance stays competitive.
Not true. As I commented a few days ago: There is sse2neon https://github.com/jratcliff63367/sse2neon. For intrinsics it supports, you only need to add a header to automatically map SSE intrinsics to NEON. There is also simde https://github.com/nemequ/simde. It is a larger project and may be more complete. These projects are still immature for sure, but when arm mac becomes a real thing, we will see better libraries that support SIMDs over different architectures.
Apple has the Accelerate.framework already, which is hand-tuned per chip-type, and is what most of the libraries call into. I’d imagine a lot of work will have been done to make that as seamless as possible on the new chips.
It’s also kind of useful for a a framework team to be able to call up the guy designing the next cou and say that “this bit here is a bottleneck, what can you do for that?”...
You still need to recompile and relink. And it's not that simple, Apple's implementation of LAPACK is well out of date for e.g. - it dates back to 2009.
Accelerate is barely used on Mac for scientific stuff in my experience. People tend to use Intel MKL - see for e.g. the Anaconda Python distribution - all of the NumPy/SciPy libs are linked against MKL.
I think it's one of the reasons Apple decided to make such a great leap with Catalina. It's a testing ground for what will happen when we switch to ARM. It's also a clear message to developers to recompile and test their builds against each new version of Xcode and macOS even if they don't plan any new release. The great pain for the users is often the choice between security updates and using their legacy software that worked perfectly so far.
First, I don't think that fast vector operations are a driving force for much of the Mac market these days. A lot of people who are sensitive to these things are already migrating away because of other wedge issues. I do care, enough to demand Intel's MKL over BLAS, but the optimized vector code issue still doesn't worry me, because my local workloads are lightweight and anything where it really matters is already being pushed out to a server farm somewhere. I've actually been trying to convince my own employer to start letting developers have Linux workstations instead of Macs for a host of other reasons. Notably, I'm just getting tired of having to deal with all the little subtle differences in behavior between Docker's Linux distribution and its Mac distribution. And, as an extension of that, I'd be much more worried about not being able to use the same Docker images in production and development than I am about a minor little thing like how well the vector instructions are being used.
Second, Apple has plenty of resources to handle doing those optimizations themselves. They did it before, with AltiVec, and, while I realize that team was disbanded a very long time ago, I expect the existence of iOS as a gaming platform means that an equivalent team either already exists, or could be ramped up quickly. And I presume that covers the most important factors for what would be noticeable to desktop users, such as Quartz.
People keep forgetting that OpenJDK is only the reference implementation, and between open source, research and commercial JVMs there are around 10 of them, with support from tiny microcontrollers all the way up to exascale HPC CPUs.
...and, if I remember well, ARM JVM were generally slow, and require(d) pay-per-every-user that you'd distribute to.
I don't know of any fast BLAS/LAPACK implementation for ARM (but I might be wrong).
So, for something that works well on x86, and is available for free (in the beer and speech sense) now requires payment, if available at all if I want to support macOS? I guess I'll skip.
OpenJDK has ARM support, including vector instructions support.
As for the other JVMs, or ART coffee flavour for that matter, they are also quite good, otherwise they would have been long out of business.
And I really don't understand why the focus with BLAS/LAPACK, if you want that kind of work make a Linux OEM happy, Apple platforms never cared for HPC work.
Apple has been pushing developers to using frameworks rather than hand-optimized vector code since before the switch to Intel, however, and that’s good since AVX is a moving target, too. For the libraries I’ve used, the combination of phones and ARM servers means a lot of them already have Neon support, often very competitive.
For the last decade, too, I’d expect some fraction of the heaviest code to have moved to the GPU.
So, I work in a niche domain - scientific software.
For one thing, you can already get scientific libraries on Linux which run on ARM. That's not too much of an issue. BLAS is an API of which there are many implementations.
The issue is that
(a) it requires everyone to recompile everything
(b) projects which are 'legacy' and are no longer developed just won't ever switch, so that software won't be runnable. If Apple do a Rosetta equivalent, they'll run slowly, but if that project ends (like Rosetta), that software will just stop working. This is pretty much the same problem as where Apple have killed x86 - there are many apps that just no longer work.
Constraining ourselves just to the JVM as VM example, there are implementations for almost any CPU out there, including microcontrollers (e.g. MicroEJ).
JVM bytecode is already code on ARM because of Android, sure it's not OpenJDK and maybe not even a VM, but there should be more than enough experience to draw on.
> And the iPhone happened. Given Apple an almost endless supply of money and a huge volume. Over many years, Apple built up a leading chip-design team.
I'm not sure if this is the case, but my read on it was always that Apple bought PA Semi to bootstrap its chip design efforts.
Another big factor for Intel was, that, financed by their huge cash flow, they had the most advanced fabs, so competitors often were 1-2 generations behind in the available processes.
But now a few things have changed. First of all, Intel got stuck with their 10nm process, so they are no longer the manufacturing lead. But most importantly, TSMC would pull ahead of Intel and offer their services to everyone in the market. For the first time, AMD had a manufacturing advantage vs. Intel.
And the iPhone happened. Given Apple an almost endless supply of money and a huge volume. Over many years, Apple built up a leading chip-design team. This already paid off big with the iPhone, having by far the most powerful CPUs in the mobile space. This also gave Apple a clear insight into the advantages of really owning the whole platform. Designing software and the cpus together.
Offering desktop-class CPUs is of course a large additional investment - so it is not a trivial step. But if Apple is willing to do it, it should be very interesting and would give hope that they have ambitious plans with the Mac, as it only makes sense if they really push the platform.