The key element in the CPU market space is volumen. Volume lowers the cost of ma...

nextos · on June 20, 2020

The big problem with switching CPUs is the instruction set.

For most people, it doesn't matter. But if you are in some niche domains, it really has an impact. I don't expect a smooth transition of libraries such as BLAS or VMs such as the JVM. You can't simply recompile these. You typically need a human to rewrite SSE, AVX and other tricky low level code so that performance stays competitive.

attractivechaos · on June 20, 2020

> You typically need a human to rewrite SSE, AVX [...] so that performance stays competitive.

Not true. As I commented a few days ago: There is sse2neon https://github.com/jratcliff63367/sse2neon. For intrinsics it supports, you only need to add a header to automatically map SSE intrinsics to NEON. There is also simde https://github.com/nemequ/simde. It is a larger project and may be more complete. These projects are still immature for sure, but when arm mac becomes a real thing, we will see better libraries that support SIMDs over different architectures.

spacedcowboy · on June 20, 2020

Apple has the Accelerate.framework already, which is hand-tuned per chip-type, and is what most of the libraries call into. I’d imagine a lot of work will have been done to make that as seamless as possible on the new chips.

It’s also kind of useful for a a framework team to be able to call up the guy designing the next cou and say that “this bit here is a bottleneck, what can you do for that?”...

Someone · on June 20, 2020

Moreover, BLAS is part of the Accelerate framework (https://developer.apple.com/documentation/accelerate), so if your code targets that, no porting is needed.

physicsguy · on June 22, 2020

You still need to recompile and relink. And it's not that simple, Apple's implementation of LAPACK is well out of date for e.g. - it dates back to 2009.

physicsguy · on June 22, 2020

Accelerate is barely used on Mac for scientific stuff in my experience. People tend to use Intel MKL - see for e.g. the Anaconda Python distribution - all of the NumPy/SciPy libs are linked against MKL.

SciPy actually went as far as dropping Accelerate as a supported BLAS implementation in 2018. https://github.com/scipy/scipy/wiki/Dropping-support-for-Acc...

dvfjsdhgfv · on June 20, 2020

I think it's one of the reasons Apple decided to make such a great leap with Catalina. It's a testing ground for what will happen when we switch to ARM. It's also a clear message to developers to recompile and test their builds against each new version of Xcode and macOS even if they don't plan any new release. The great pain for the users is often the choice between security updates and using their legacy software that worked perfectly so far.

mumblemumble · on June 20, 2020

Two thoughts there:

First, I don't think that fast vector operations are a driving force for much of the Mac market these days. A lot of people who are sensitive to these things are already migrating away because of other wedge issues. I do care, enough to demand Intel's MKL over BLAS, but the optimized vector code issue still doesn't worry me, because my local workloads are lightweight and anything where it really matters is already being pushed out to a server farm somewhere. I've actually been trying to convince my own employer to start letting developers have Linux workstations instead of Macs for a host of other reasons. Notably, I'm just getting tired of having to deal with all the little subtle differences in behavior between Docker's Linux distribution and its Mac distribution. And, as an extension of that, I'd be much more worried about not being able to use the same Docker images in production and development than I am about a minor little thing like how well the vector instructions are being used.

Second, Apple has plenty of resources to handle doing those optimizations themselves. They did it before, with AltiVec, and, while I realize that team was disbanded a very long time ago, I expect the existence of iOS as a gaming platform means that an equivalent team either already exists, or could be ramped up quickly. And I presume that covers the most important factors for what would be noticeable to desktop users, such as Quartz.

afwaller · on June 20, 2020

There’s already BLAS and JVM for ARM.

dragandj · on June 20, 2020

how fast are tese blas and jvm implementations? what’s the redistribution licenses?

pjmlp · on June 20, 2020

Depends on what you are willing to pay.

People keep forgetting that OpenJDK is only the reference implementation, and between open source, research and commercial JVMs there are around 10 of them, with support from tiny microcontrollers all the way up to exascale HPC CPUs.

dragandj · on June 20, 2020

...and, if I remember well, ARM JVM were generally slow, and require(d) pay-per-every-user that you'd distribute to.

I don't know of any fast BLAS/LAPACK implementation for ARM (but I might be wrong).

So, for something that works well on x86, and is available for free (in the beer and speech sense) now requires payment, if available at all if I want to support macOS? I guess I'll skip.

pjmlp · on June 20, 2020

OpenJDK has ARM support, including vector instructions support.

As for the other JVMs, or ART coffee flavour for that matter, they are also quite good, otherwise they would have been long out of business.

And I really don't understand why the focus with BLAS/LAPACK, if you want that kind of work make a Linux OEM happy, Apple platforms never cared for HPC work.

paulryanrogers · on June 20, 2020

But how far has Apple's chip diverged from reference ARM designs? And does ARM preserve backwards compatibility as aggressively as Intel?

Because I see compatibility as the biggest reason for the dominance of x86 and x64.

saagarjha · on June 20, 2020

They present the same interface, and that's all that really matters. (AArch64 is incompatible with the 32-bit instruction set and thumb, FWIW.)

acdha · on June 20, 2020

Apple has been pushing developers to using frameworks rather than hand-optimized vector code since before the switch to Intel, however, and that’s good since AVX is a moving target, too. For the libraries I’ve used, the combination of phones and ARM servers means a lot of them already have Neon support, often very competitive.

For the last decade, too, I’d expect some fraction of the heaviest code to have moved to the GPU.

physicsguy · on June 22, 2020

So, I work in a niche domain - scientific software.

For one thing, you can already get scientific libraries on Linux which run on ARM. That's not too much of an issue. BLAS is an API of which there are many implementations.

The issue is that (a) it requires everyone to recompile everything (b) projects which are 'legacy' and are no longer developed just won't ever switch, so that software won't be runnable. If Apple do a Rosetta equivalent, they'll run slowly, but if that project ends (like Rosetta), that software will just stop working. This is pretty much the same problem as where Apple have killed x86 - there are many apps that just no longer work.

pjmlp · on June 20, 2020

VMs are exactly where it doesn't matter.

Constraining ourselves just to the JVM as VM example, there are implementations for almost any CPU out there, including microcontrollers (e.g. MicroEJ).

dtech · on June 20, 2020

JVM bytecode is already code on ARM because of Android, sure it's not OpenJDK and maybe not even a VM, but there should be more than enough experience to draw on.

lostmsu · on June 20, 2020

Android doesn't use JVM bytecode though. It used to work on Dalvik and probably still does.

pjmlp · on June 20, 2020

And there are other implementations as well.

dehrmann · on June 20, 2020

> And the iPhone happened. Given Apple an almost endless supply of money and a huge volume. Over many years, Apple built up a leading chip-design team.

I'm not sure if this is the case, but my read on it was always that Apple bought PA Semi to bootstrap its chip design efforts.

https://en.wikipedia.org/wiki/P.A._Semi

_ph_ · on June 20, 2020

Yes, they certainly started with the team from PA Semi. But by now their chip department is a multiple of that.