It will be faster only for code that uses/is optimized for that specific extensi...

Tagbert · 2024-11-01T02:58:38 1730429918

That is an example of the kind of hardware and software synergy that gives good performance on Apple Silicon for apps in iOS and Mac OS. Apple execs have given interviews where they talk about this kind of thing. They look at code in the OS and in their application libraries that can benefit from hardware optimization and they build in hardware support for it to improve performance overall. This helps all kinds of app running on the OS and using the standard libraries.

bilbo0s · 2024-10-31T10:17:54 1730369874

Are you implying there are no use cases for matrix multiply?

In any case, the two main deep learning packages have already been updated so for the place this change was almost certainly targeted for, your complaint is answered. I'm just stunned that anyone would complain about hardware matrix multiplication? I've wondered why that hasn't been ubiquitous for the past 20 years.

Everyone should make that improvement in their hardware. Everyone should get rid of code implementing matrix mult and make the hardware call instead. It's common sense. Not to put too fine a point on it, but your complaint assumes that GeekBench is based on code that has implemented all those changes.

chipdart · 2024-11-01T06:36:02 1730442962

> Are you implying there are no use cases for matrix multiply?

The whole point is that these highly specialized scenarios are only featured in very specialized usecases, and don't reflect in overall performance.

We've been dealing with the regular release of specialized processor operations for a couple of decades. This story is not new. You see cherry-picked microbenchmarks used to plot impressive bar charts, immediately followed by the realization that a) in general this sort of operator is rarely invoked with enough frequency to be noticeable, b) you need to build code with specialized flags to get software to actually leverage this feature, c) even then it's only noticeable in very specialized workloads that already run on the background.

I still recall when fused multiply-add was such a game changer because everyone used polynomials and these operations would triple performance. Not the case.

And more to the point, do you believe that matrix multiplication is a breakthrough discovery that is only now surfacing? Computers were being designed around matrix operations way before they were even considered to be in a household.

miahi · 2024-10-31T10:28:25 1730370505

I'm not complaining, I'm just saying that the higher numbers of that benchmark result do not translate directly to better performance for all software you run. Deep learning as it is right now is probably the main application that benefits from this extension (and probably the reason why it was added in hardware at this point in time).

parsimo2010 · 2024-10-31T12:11:31 1730376691

Well you're really just describing benchmarks- if the benchmark doesn't represent your standard workflow then it probably isn't a good reference for you. But Geekbench includes a bunch of components based on real-world applications like file compression, web browsing, and PDF rendering. So it probably isn't perfect, but it's likely that the M4 will feel a bit faster in regular use compared to an older generation MacBook Pro.