Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The last I checked, AMD was outperforming Apple perf/dollar on the high end, though they were close on perf/watt for the TDPs where their parts overlapped.

I’d be curious to know if this changes that. It’d take a lot more than doubling cores to take out the very high power AMD parts, but this might squeeze them a bit.

Interestingly, AMD has also been investing heavily in unified RAM. I wonder if they have / plan an SoC that competes 1:1 with this. (Most of the parts I’m referring to are set up for discrete graphics.)



The M4 Pro is 56% faster in ST performance against AMD’s new Strix Halo while being 3.6x more efficient.

Source: https://www.notebookcheck.net/AMD-Ryzen-AI-Max-395-Analysis-...

Cinebench 2024 results.


That’s a laptop part, so it makes different tradeoffs.

Somewhere on the internet there is a tdp wattage vs performance x-y plot. There’s a pareto optimal region where all the apple and amd parts live. Apple owns low tdp, AMD owns high tdp. They duke it out in the middle. Intel is nowhere close to the line.

I’d guess someone has made one that includes datacenter ARM, but I’ve never seen it.


> tdp wattage vs performance x-y plot

This?

https://www.videocardbenchmark.net/power_performance.html#sc...


That’s GPUs, not CPUs


High TDP? You mean server-grade CPUs? Apple doesn't make those.


True, but these "Ultra" chips do target the same niche as (some) high-TDP chips.

Workstations (like the Mac Studio) have traditionally been a space where "enthusiast"-grade consumer parts (think Threadripper) and actual server parts competed. The owner of a workstation didn't usually care about their machine's TDP; they just cared that it could chew through their workloads as quickly as possible. But, unlike an actual server, workstations didn't need the super-high core count required for multitenant parallelism; and would go idle for long stretches — thus benefitting (though not requiring) more-efficient power management that could drive down baseline TDP.


Oh you. mean Threadripper. I thought you were talking about Epyc.

Anyway, I don't think it's comparable really. This thing comes with a fat GPU, NPU, and unified memory. Threadripper is just a CPU.


The GPU and NPU shouldn't be consuming power when not in use. Why shouldn't we compare M3 Ultra to Threadripper?


> You mean server-grade CPUs? Apple doesn't make those.

Right.

It is coming up because we're in a thread about using them as server CPUs. (c.f. "colo", "2U" in OP and OP's child), and the person you're replying to is making the same point you are

For years now, people will comment "these are the best chips, I'd replace all chips with them."

Then someone points out perf/watt is not perf.

Then someone else points out some M-series is much faster than a random CPU.

And someone else points out that the random CPU is not a top performing CPU.

And someone else points out M-series are optimized for perf/watt and it'd suck if it wasn't.

I love my MacBook, the M-series has no competitors in the case it's designed for.

I'd just prefer, at this point, that we can skip long threads rehashing it.

It's a great chip. It's not the fastest, and it's better for that. We want perf/watt in our mobile devices. There's fundamental, well-understood, engineering tradeoffs that imply being great at that necessitates the existence of faster processors.


  It's a great chip. It's not the fastest, 
It has the world's fastest single thread.


Maybe it is, maybe not, UNIX and Windows server software have been multithreaded / multi-process for decades, we want tons of threads and processes, not a single one.


I can't quite tell what's going on here, earlier, you seem to be clear -- c.f. "Apple doesn't make server-grade CPUs"


Correct. But their M4 line has the fastest single thread performance in the world.


According to what source? Passmark says otherwise[1]. The fastest Intel CPUs have both a higher single thread and multi thread score in that test.

[1] https://www.cpubenchmark.net/singleThread.html


Passmark shows the M4 chips slower than the M3, so something weird is going on. Geekbench 6 has the M4 well ahead of Intel and AMD, with the M3 about 25% slower like you'd expect: https://www.cpu-monkey.com/en/cpu_benchmark-geekbench_6_sing...


Passmark is an outdated benchmark not optimized for Arm.


I think that is much too hand-wavy regarding the performance differences.

Both Passmark and Geekbench are aggregates of a variety of tasks. If you dig into the individual tests that constitute this aggregate score, you will find different platforms perform better, or worse, on certain tests than others. I would wager that, for many applications, only a subset of these tasks are relevant to the performance of the application, yet such benchmark suites distil out all nuance into a single value.

Here is a personal anecdote. I have tried running CASTEP (built from source), a density functional theory calculator, on both an M1 Max MacBook Pro [0], and on a Ryzen 7840HS Lenovo laptop [1]. A cursory glance at those Geekbench results linked might make you expect that the performance is roughly equivalent, but the Ryzen outperforms the Mac by about 4x, a huge difference.

What happens if we try and dig into any particular benchmark to explain this? If you click on any particular benchmark in the Geekbench search lists, you will see they test things like "File Compression", "HTML5 Browser", "Clang". Which of these maps most closely to the sorts of instructions used in CASTEP? Your guess is as good as mine.

If anything, I would say Passmark is quite a bit less abstract about this. Looking at the Mac [2] and Ryzen [3] Passmark results, you can see the Ryzen outperforms the Mac by about 2x on "extended instructions", which appear to involve some matrix math, and also about 2x on "integer math". The Mac, meanwhile, appears to be extremely good at finding prime numbers, at over 3x the speed of the Ryzen. Presumably the Ryzen's balance of instruction performance is more useful for DFT calculations than the Mac's, which perhaps is weaker in areas that might matter for this application, but stronger in areas that might matter for others.

Of course, optimization is likely a component of this. How much effort is put into the OpenBLAS, MPI, etc, implementations on aarch64 darwin vs. x86-64 linux? This is a good question. It is, however, mostly irrelevant to the end consumer, who wishes to consume this software for use in their further research, rather than dig into high-performance computing library optimization.

[0] https://browser.geekbench.com/search?q=7840hs

[1] https://browser.geekbench.com/search?q=m1+max

[2] https://www.cpubenchmark.net/cpu.php?cpu=Apple+M1+Max+10+Cor...

[3] https://www.cpubenchmark.net/cpu.php?cpu=AMD+Ryzen+7+PRO+784...


This is my experience as well. Geekbench heavily favors the type of workload that runs best on Apple hardware (those tends to be general case, most likely to be used by the mass) but in practice if you have complex software to run your experience will not match the bench numbers.

I think PassMark is more honest as well, because it just gives scores for calculation throughput instead of specific tasks. It more closely matches what experience you will get if you have a varied load.

But since it's Apple we are talking about, their users just want to think they have the best and that's all that matters.


PassMark is "more honest"? It represents a varied load??? No, sorry, it's just not good. Seriously, read their own documentation.

https://www.cpubenchmark.net/cpu_test_info.html

Right from the top it's amateurish stuff: their idea of an integer benchmark to measure "raw" CPU throughput (whatever that means) is to make a bunch of random ints and add/subtract/multiply/divide them.

Very few programs do a high volume of either integer multiply or divide. And when they do, they generally aren't doing it on random numbers. This is the kind of thing which gives synthetic benchmarks their highly deserved bad rep. It might be even worse than Dhrystone MIPs, and believe me, in benchmark nerd circles, that is a fucking diss.

If you look up Geekbench's docs, you'll find that it's all about real-world compute tasks. For example, one of the int tests in their suite is to compile a reference program with the Clang compiler. Compilers are a reasonably good litmus test of integer performance; they heavily stress the CPU features most responsible for high integer performance in this day and age. (Branch prediction, memory prefetching, out-of-order execution, speculation, that kind of thing.)

You claimed that PassMark reflects "complex" software, and Geekbench doesn't. However, I would be willing to bet that Clang alone is far more complex than all of PassMark's CPU benchmarks put together, whether you measure by SLOC or program structure.

Note that none of this has anything to do with Mac vs PC. Passmark is simply a bad benchmark that should not be used, period. That said, there are a bunch of warning signs that PassMark's ports to everything outside its native x86 Windows are probably quite sloppy, so it's even less useful for crossplatform comparisons.


Geekbench correlates with SPEC, the industry standard in CPU benchmark and what enterprise companies such as AWS uses to judge a CPU performance. It has .99 correlation.

https://medium.com/silicon-reimagined/performance-delivered-...

Passmark is an outdated benchmark that isn't updated to use ARM instructions.


I keep seeing people repeat incorrect rhetoric about Apple hardware, like this example.

There are nice things that Apple has, but as you can see there is significant reality warping going on.

Why does it persist?


Never change hn, can't criticise apple here.


Well, no, right?

The M4 Max had great, I would argue the best at time of release, single core results on Geekbench.

That is a different claim from M4 line has the top single thread performance in the world.

I'm curious:

You're signalling both that you understand the fundamental tradeoff ("Apple doesn't make server-grade CPUs") and that you are talking about something else (follow-up with M4 family has top single-thread performance)

What drives that? What's the other thing you're hoping to communicate?

If you are worried that if you leave it at "Apple doesn't make server-grade CPUs", that people will think M4s aren't as great as they are, this is a technical-enough audience, I think we'll understand :) It doesn't come across as denigrating the M-series, but as understanding a fundamental, physically-based, tradeoff.


Isn't the rack-mounted Mac Pro supposedly "server-grade" (https://www.apple.com/shop/buy-mac/mac-pro/rack)?

At least judging by the mounts, they want them to be used that way, even though the CPU might not fit with the de facto industry label for "server-grade".


The rack mount Mac Pro doesn't really make sense for a data center. It's 5U high, which is much too big for a data center. It doesn't have standard server features like redundant power supplies.

The only use case I can think of is for audio workstations, where people have lots of rack mount equipment, so you can have everything including the computer in the rack. But even for that use case it's quite big.


Server grade CPUs. I thought he was referring to Epyc CPUs.


It also include gaming machines. Of course, Apple also don't make those.


Indeed. The M3 Ultra is in the midrange where they duke it out. Similarly, for its niche, the iPhone CPU is was better than AMD’s low end processors.

Anyway the Apple config in the article costs about 5x more than a comparable low end AMD server with 512GB of ram, but adds an NPU. AMD has NPUs in lower end stuff; not sure about this TDP range.


How is that comparable? On-package RAM is lower latency and higher bandwidth and also much more expensive than external DDR5 sticks.


Same. I'm not sure what to make of the various claims. I personally defer to this table in general: https://www.cpubenchmark.net/power_performance.html.

I'm not sure how those benchmarks translate to common real world use cases.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: