The last I checked, AMD was outperforming Apple perf/dollar on the high end, tho...

aurareturn · 2025-03-05T15:54:39 1741190079

The M4 Pro is 56% faster in ST performance against AMD’s new Strix Halo while being 3.6x more efficient.

Source: https://www.notebookcheck.net/AMD-Ryzen-AI-Max-395-Analysis-...

Cinebench 2024 results.

hedora · 2025-03-05T16:11:32 1741191092

That’s a laptop part, so it makes different tradeoffs.

Somewhere on the internet there is a tdp wattage vs performance x-y plot. There’s a pareto optimal region where all the apple and amd parts live. Apple owns low tdp, AMD owns high tdp. They duke it out in the middle. Intel is nowhere close to the line.

I’d guess someone has made one that includes datacenter ARM, but I’ve never seen it.

tomrod · 2025-03-05T17:45:04 1741196704

> tdp wattage vs performance x-y plot

This?

https://www.videocardbenchmark.net/power_performance.html#sc...

echoangle · 2025-03-05T19:31:59 1741203119

That’s GPUs, not CPUs

aurareturn · 2025-03-05T16:19:53 1741191593

High TDP? You mean server-grade CPUs? Apple doesn't make those.

derefr · 2025-03-05T16:33:40 1741192420

True, but these "Ultra" chips do target the same niche as (some) high-TDP chips.

Workstations (like the Mac Studio) have traditionally been a space where "enthusiast"-grade consumer parts (think Threadripper) and actual server parts competed. The owner of a workstation didn't usually care about their machine's TDP; they just cared that it could chew through their workloads as quickly as possible. But, unlike an actual server, workstations didn't need the super-high core count required for multitenant parallelism; and would go idle for long stretches — thus benefitting (though not requiring) more-efficient power management that could drive down baseline TDP.

aurareturn · 2025-03-05T16:45:42 1741193142

Oh you. mean Threadripper. I thought you were talking about Epyc.

Anyway, I don't think it's comparable really. This thing comes with a fat GPU, NPU, and unified memory. Threadripper is just a CPU.

mort96 · 2025-03-05T21:12:27 1741209147

The GPU and NPU shouldn't be consuming power when not in use. Why shouldn't we compare M3 Ultra to Threadripper?

refulgentis · 2025-03-05T17:45:59 1741196759

> You mean server-grade CPUs? Apple doesn't make those.

Right.

It is coming up because we're in a thread about using them as server CPUs. (c.f. "colo", "2U" in OP and OP's child), and the person you're replying to is making the same point you are

For years now, people will comment "these are the best chips, I'd replace all chips with them."

Then someone points out perf/watt is not perf.

Then someone else points out some M-series is much faster than a random CPU.

And someone else points out that the random CPU is not a top performing CPU.

And someone else points out M-series are optimized for perf/watt and it'd suck if it wasn't.

I love my MacBook, the M-series has no competitors in the case it's designed for.

I'd just prefer, at this point, that we can skip long threads rehashing it.

It's a great chip. It's not the fastest, and it's better for that. We want perf/watt in our mobile devices. There's fundamental, well-understood, engineering tradeoffs that imply being great at that necessitates the existence of faster processors.

aurareturn · 2025-03-05T17:55:52 1741197352

  It's a great chip. It's not the fastest,

It has the world's fastest single thread.

pjmlp · 2025-03-06T07:36:56 1741246616

Maybe it is, maybe not, UNIX and Windows server software have been multithreaded / multi-process for decades, we want tons of threads and processes, not a single one.

refulgentis · 2025-03-05T18:00:44 1741197644

I can't quite tell what's going on here, earlier, you seem to be clear -- c.f. "Apple doesn't make server-grade CPUs"

aurareturn · 2025-03-05T18:18:55 1741198735

Correct. But their M4 line has the fastest single thread performance in the world.

nameequalsmain · 2025-03-05T19:08:03 1741201683

According to what source? Passmark says otherwise[1]. The fastest Intel CPUs have both a higher single thread and multi thread score in that test.

[1] https://www.cpubenchmark.net/singleThread.html

orangecat · 2025-03-06T00:03:26 1741219406

Passmark shows the M4 chips slower than the M3, so something weird is going on. Geekbench 6 has the M4 well ahead of Intel and AMD, with the M3 about 25% slower like you'd expect: https://www.cpu-monkey.com/en/cpu_benchmark-geekbench_6_sing...

aurareturn · 2025-03-06T03:46:19 1741232779

Passmark is an outdated benchmark not optimized for Arm.

impure-aqua · 2025-03-06T13:56:53 1741269413

I think that is much too hand-wavy regarding the performance differences.

Both Passmark and Geekbench are aggregates of a variety of tasks. If you dig into the individual tests that constitute this aggregate score, you will find different platforms perform better, or worse, on certain tests than others. I would wager that, for many applications, only a subset of these tasks are relevant to the performance of the application, yet such benchmark suites distil out all nuance into a single value.

Here is a personal anecdote. I have tried running CASTEP (built from source), a density functional theory calculator, on both an M1 Max MacBook Pro [0], and on a Ryzen 7840HS Lenovo laptop [1]. A cursory glance at those Geekbench results linked might make you expect that the performance is roughly equivalent, but the Ryzen outperforms the Mac by about 4x, a huge difference.

What happens if we try and dig into any particular benchmark to explain this? If you click on any particular benchmark in the Geekbench search lists, you will see they test things like "File Compression", "HTML5 Browser", "Clang". Which of these maps most closely to the sorts of instructions used in CASTEP? Your guess is as good as mine.

If anything, I would say Passmark is quite a bit less abstract about this. Looking at the Mac [2] and Ryzen [3] Passmark results, you can see the Ryzen outperforms the Mac by about 2x on "extended instructions", which appear to involve some matrix math, and also about 2x on "integer math". The Mac, meanwhile, appears to be extremely good at finding prime numbers, at over 3x the speed of the Ryzen. Presumably the Ryzen's balance of instruction performance is more useful for DFT calculations than the Mac's, which perhaps is weaker in areas that might matter for this application, but stronger in areas that might matter for others.

Of course, optimization is likely a component of this. How much effort is put into the OpenBLAS, MPI, etc, implementations on aarch64 darwin vs. x86-64 linux? This is a good question. It is, however, mostly irrelevant to the end consumer, who wishes to consume this software for use in their further research, rather than dig into high-performance computing library optimization.

[0] https://browser.geekbench.com/search?q=7840hs

[1] https://browser.geekbench.com/search?q=m1+max

[2] https://www.cpubenchmark.net/cpu.php?cpu=Apple+M1+Max+10+Cor...

[3] https://www.cpubenchmark.net/cpu.php?cpu=AMD+Ryzen+7+PRO+784...

seec · 2025-03-06T15:41:48 1741275708

This is my experience as well. Geekbench heavily favors the type of workload that runs best on Apple hardware (those tends to be general case, most likely to be used by the mass) but in practice if you have complex software to run your experience will not match the bench numbers.

I think PassMark is more honest as well, because it just gives scores for calculation throughput instead of specific tasks. It more closely matches what experience you will get if you have a varied load.

But since it's Apple we are talking about, their users just want to think they have the best and that's all that matters.

krunkcoin · 2025-03-09T07:59:24 1741507164

PassMark is "more honest"? It represents a varied load??? No, sorry, it's just not good. Seriously, read their own documentation.

https://www.cpubenchmark.net/cpu_test_info.html

Right from the top it's amateurish stuff: their idea of an integer benchmark to measure "raw" CPU throughput (whatever that means) is to make a bunch of random ints and add/subtract/multiply/divide them.

Very few programs do a high volume of either integer multiply or divide. And when they do, they generally aren't doing it on random numbers. This is the kind of thing which gives synthetic benchmarks their highly deserved bad rep. It might be even worse than Dhrystone MIPs, and believe me, in benchmark nerd circles, that is a fucking diss.

If you look up Geekbench's docs, you'll find that it's all about real-world compute tasks. For example, one of the int tests in their suite is to compile a reference program with the Clang compiler. Compilers are a reasonably good litmus test of integer performance; they heavily stress the CPU features most responsible for high integer performance in this day and age. (Branch prediction, memory prefetching, out-of-order execution, speculation, that kind of thing.)

You claimed that PassMark reflects "complex" software, and Geekbench doesn't. However, I would be willing to bet that Clang alone is far more complex than all of PassMark's CPU benchmarks put together, whether you measure by SLOC or program structure.

Note that none of this has anything to do with Mac vs PC. Passmark is simply a bad benchmark that should not be used, period. That said, there are a bunch of warning signs that PassMark's ports to everything outside its native x86 Windows are probably quite sloppy, so it's even less useful for crossplatform comparisons.

aurareturn · 2025-03-07T03:35:50 1741318550

Geekbench correlates with SPEC, the industry standard in CPU benchmark and what enterprise companies such as AWS uses to judge a CPU performance. It has .99 correlation.

https://medium.com/silicon-reimagined/performance-delivered-...

Passmark is an outdated benchmark that isn't updated to use ARM instructions.

worthless-trash · 2025-03-06T02:21:49 1741227709

I keep seeing people repeat incorrect rhetoric about Apple hardware, like this example.

There are nice things that Apple has, but as you can see there is significant reality warping going on.

Why does it persist?

worthless-trash · 2025-03-06T10:03:11 1741255391

Never change hn, can't criticise apple here.

refulgentis · 2025-03-05T19:09:04 1741201744

Well, no, right?

The M4 Max had great, I would argue the best at time of release, single core results on Geekbench.

That is a different claim from M4 line has the top single thread performance in the world.

I'm curious:

You're signalling both that you understand the fundamental tradeoff ("Apple doesn't make server-grade CPUs") and that you are talking about something else (follow-up with M4 family has top single-thread performance)

What drives that? What's the other thing you're hoping to communicate?

If you are worried that if you leave it at "Apple doesn't make server-grade CPUs", that people will think M4s aren't as great as they are, this is a technical-enough audience, I think we'll understand :) It doesn't come across as denigrating the M-series, but as understanding a fundamental, physically-based, tradeoff.

diggan · 2025-03-05T16:35:41 1741192541

Isn't the rack-mounted Mac Pro supposedly "server-grade" (https://www.apple.com/shop/buy-mac/mac-pro/rack)?

At least judging by the mounts, they want them to be used that way, even though the CPU might not fit with the de facto industry label for "server-grade".

jjcob · 2025-03-06T06:44:04 1741243444

The rack mount Mac Pro doesn't really make sense for a data center. It's 5U high, which is much too big for a data center. It doesn't have standard server features like redundant power supplies.

The only use case I can think of is for audio workstations, where people have lots of rack mount equipment, so you can have everything including the computer in the rack. But even for that use case it's quite big.

aurareturn · 2025-03-05T16:46:26 1741193186

Server grade CPUs. I thought he was referring to Epyc CPUs.

yxhuvud · 2025-03-05T19:11:42 1741201902

It also include gaming machines. Of course, Apple also don't make those.

hedora · 2025-03-05T16:47:13 1741193233

Indeed. The M3 Ultra is in the midrange where they duke it out. Similarly, for its niche, the iPhone CPU is was better than AMD’s low end processors.

Anyway the Apple config in the article costs about 5x more than a comparable low end AMD server with 512GB of ram, but adds an NPU. AMD has NPUs in lower end stuff; not sure about this TDP range.

lukevp · 2025-03-05T23:54:37 1741218877

How is that comparable? On-package RAM is lower latency and higher bandwidth and also much more expensive than external DDR5 sticks.

nick_ · 2025-03-05T16:01:46 1741190506

Same. I'm not sure what to make of the various claims. I personally defer to this table in general: https://www.cpubenchmark.net/power_performance.html.

I'm not sure how those benchmarks translate to common real world use cases.