Hacker News new | past | comments | ask | show | jobs | submit login

It will be faster only for code that uses/is optimized for that specific extension. And the examples you give are not really correct.

If you add a supercharger you will get more power, but if the car's transmission is not upgraded, you might just get some broken gears and shafts.

If you add more solar panels to your roof, you might exceed the inverter power, and the panels will not bring benefits.

It's true that you will benefit from the changes above, but not just by themselves - something else needs to change so you can benefit. And in the case of the M4 and these extensions, the software needs do be changed and also to have an use case for these extensions.




That is an example of the kind of hardware and software synergy that gives good performance on Apple Silicon for apps in iOS and Mac OS. Apple execs have given interviews where they talk about this kind of thing. They look at code in the OS and in their application libraries that can benefit from hardware optimization and they build in hardware support for it to improve performance overall. This helps all kinds of app running on the OS and using the standard libraries.


Are you implying there are no use cases for matrix multiply?

In any case, the two main deep learning packages have already been updated so for the place this change was almost certainly targeted for, your complaint is answered. I'm just stunned that anyone would complain about hardware matrix multiplication? I've wondered why that hasn't been ubiquitous for the past 20 years.

Everyone should make that improvement in their hardware. Everyone should get rid of code implementing matrix mult and make the hardware call instead. It's common sense. Not to put too fine a point on it, but your complaint assumes that GeekBench is based on code that has implemented all those changes.


> Are you implying there are no use cases for matrix multiply?

The whole point is that these highly specialized scenarios are only featured in very specialized usecases, and don't reflect in overall performance.

We've been dealing with the regular release of specialized processor operations for a couple of decades. This story is not new. You see cherry-picked microbenchmarks used to plot impressive bar charts, immediately followed by the realization that a) in general this sort of operator is rarely invoked with enough frequency to be noticeable, b) you need to build code with specialized flags to get software to actually leverage this feature, c) even then it's only noticeable in very specialized workloads that already run on the background.

I still recall when fused multiply-add was such a game changer because everyone used polynomials and these operations would triple performance. Not the case.

And more to the point, do you believe that matrix multiplication is a breakthrough discovery that is only now surfacing? Computers were being designed around matrix operations way before they were even considered to be in a household.


I'm not complaining, I'm just saying that the higher numbers of that benchmark result do not translate directly to better performance for all software you run. Deep learning as it is right now is probably the main application that benefits from this extension (and probably the reason why it was added in hardware at this point in time).


Well you're really just describing benchmarks- if the benchmark doesn't represent your standard workflow then it probably isn't a good reference for you. But Geekbench includes a bunch of components based on real-world applications like file compression, web browsing, and PDF rendering. So it probably isn't perfect, but it's likely that the M4 will feel a bit faster in regular use compared to an older generation MacBook Pro.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: