Hacker News new | past | comments | ask | show | jobs | submit login
2022 Mac Studio (20-core M1 Ultra) Review (hrtapps.com)
276 points by selimnairb on April 28, 2022 | hide | past | favorite | 215 comments



Above, we’re looking at parallel performance of the NASA USM3D CFD solver as it computes flow over a classic NACA 0012 airfoil section at low speed conditions.

If this solver relies on matrix multiplication and uses the macOS Accelerate framework, you are seeing this speedup because M1 Macs have AMX matrix multiplication co-processors. In single precision GEMM, the M1 is faster than an 8 core Ryzen 3700X and a bit slower than a 12 core Ryzen 5900X. The M1 Pro doubles the GFLOPS of the M1 (due to having AMX co-processors for both performance core clusters). And the M1 Ultra again doubles the GFLOPs (4 performance core clusters, each with an AMX unit).

Single-precision matrix multiplication benchmark results for the Ryzen 3700X/3900X and Apple M1/M1 Pro/M1 Ultra are here:

https://twitter.com/danieldekok/status/1511348597215961093?s...


The 5900X can do 24 threads, so why artificially limit it to 16 threads? Unless that's because the M1 Ultra has 16 performance cores. Why include the 3 year old 3700X? Why not include the 32 thread 5950X which can be had in a complete system for 1/3 the price of the Mac Studio M1 Ultra?


This isn’t a scientific study, it’s one guy who reviews the pro Macs he gets to work on. Also he’s been using the same benchmark for years and the figures for previous Mac Pros are historical, so past tests weren’t conducted with any knowledge of the M1 architecture.


They clearly just picked 16 threads because it is enough to demonstrate the point.

M1 only has 4 performance cores and they still ran it with 16 threads. So it's not like they are trying to skew the benchmark by not showing the potential of the 5900x.

Also the 5950x is 2 years old and still is an active product.


The performance generally regresses when using more threads on the Ryzen CPUs.

Also, maybe I can only benchmark on the hardware that I actually have?


The 5900X is a fine reference as a common chip. The peak performance should be at 12 threads, it has 12 cores, and that's what the table appears to show.


It would make more sense to compare systems with similar price points.

In this case that would mean ThreadRipper systems and not Ryzen.


Sure, if I had a ThreadRipper, I’d be happy to add the results. I don’t have an M1 Ultra either, but someone was kind enough to do the benchmark for me.


For the guy who, you know, actually did the comparison here... Of _course_ it makes most sense to compare the new system he has on his desk with the previous systems he's already done relevant performance tests as comparison points.

If you want a comparison between a Threadripper system and an M1 Ultra system, I'd be fascinated to read your blog post with your results.


The scaling curve is different from the other cpus which implies the performance bottleneck is not in matrix multiplication (which would be nearly linear on the older cpus because it is embarrassingly parallel).


CFD is highly memory-bandwidth-bottlenecked, it is in fact pretty much the prototypical memory-bandwidth-bottlenecked task.

The performance scaling you see between systems pretty much corresponds to the memory bandwidth in those configurations.

Note that on the M1, the CPU can only access a fraction (about 25% iirc) of the total memory bandwidth, you have to use the GPU to really get the full performance of the M1 here.


I saw this quantified (I think at anandtech), something like 220GB/sec out of 400GB/sec on the M1 max. So something north of 50%.

Also keep in mind that normal x86-64's, even without an IGP only get about 60-65% of peak, even with nothing else sharing the memory bus. I often see this quantified with McCalpin's stream benchmark.

So the M1 Ultra likely has a pretty impressive memory bandwidth of around 440GB/sec, which isn't a large fraction of 800GB/sec, but it still more than any other desktop or server chip I know of. The AMD Epcy maxes out at 8 channels of DDR-3200, which is in the neighborhood of 208GB/sec peak, with an observed bandwidth of 110-120GB/sec.


> I saw this quantified (I think at anandtech)

Correct. The numbers we have are from their M1 Max deep dive, with the M1 Ultra being two M1 Max chips fused together.

For CPU cores:

>Adding a third thread there’s a bit of an imbalance across the clusters, DRAM bandwidth goes to 204GB/s, but a fourth thread lands us at 224GB/s and this appears to be the limit on the SoC fabric that the CPUs are able to achieve, as adding additional cores and threads beyond this point does not increase the bandwidth to DRAM at all. It’s only when the E-cores, which are in their own cluster, are added in, when the bandwidth is able to jump up again, to a maximum of 243GB/s.

https://www.anandtech.com/show/17024/apple-m1-max-performanc...

GPU cores:

>I haven’t seen the GPU use more than 90GB/s (measured via system performance counters). While I’m sure there’s some productivity workload out there where the GPU is able to stretch its legs, we haven’t been able to identify them yet.

Other:

>That leaves everything else which is on the SoC, media engine, NPU, and just workloads that would simply stress all parts of the chip at the same time. The new media engine on the M1 Pro and Max are now able to decode and encode ProRes RAW formats, the above clip is a 5K 12bit sample with a bitrate of 1.59Gbps, and the M1 Max is not only able to play it back in real-time, it’s able to do it at multiple times the speed, with seamless immediate seeking. Doing the same thing on my 5900X machine results in single-digit frames. The SoC DRAM bandwidth while seeking around was at around 40-50GB/s – I imagine that workloads that stress CPU, GPU, media engines all at the same time would be able to take advantage of the full system memory bandwidth, and allow the M1 Max to stretch its legs and differentiate itself more from the M1 Pro and other systems.


> of around 440GB/sec, which isn't a large fraction of 800GB/sec

Where's the 800GB/s from?


Peak = never observed, but calculated from clock speed * bus width. Much like the speed of light, you'll never see it.

That number for the M1 Ultra (from the OP's post) = 800GB/sec. McCalpin's stream benchmark is often cited as a practical/useful number for usable bandwidth using a straight forward implementation in C or Fortran without trying to play games, much like the vast majority of codes out there.

Also note that the x86-64's in the world use a strict memory model that results in a lower fraction of observed bandwidth vs peak. Arm has a looser memory model which achieves a higher fraction of peak.


> memory-bandwidth-bottlenecked task.

The thing that's interesting for me around this is: I hate the on-die memory. I hate the idea of not being able to upgrade after your order, or a few years down the road. It has both practical problems and offends my inner nerd.

But.

This is a useful example of why there's value in it. It seems unlikely that you'd get as good a result with the traditional memory architecture.


People who need this sort of system aren't going to be looking to expand memory capacity in "a few years down the road." They'll be looking to purchase or lease the current model.

Also, if you really need more memory - buy the config with more memory and sell the old config. Resale on macs is usually superb.

I can't remember hearing a friend or gaming buddy say "I upgraded my ram." I haven't upgraded ram in any machine I've owned in twenty years and even before that, it was rare. There was never any point in putting that sort of money into an out of date CPU and memory architecture.

If you owned a trashcan Mac Pro and were a working creative, would you be upgrading its memory this month? Nope...


I contest this claim. I'm professional and I know I'm not alone in buying an iMac 5K with 16 GiB or less explicitly to later upgrade to much more with 3rd party (like Crucial) memory for vastly less than Apple asks.

That said, if the memory is fixed like on the Ultra, then sure, we'll buy what we need (and a bit more).


I think what people are really looking for is expandable memory so that you can buy a base version, and then add cheaper 3rd party RAM.


I’m hoping that the Mac Pro introduces a two tier memory architecture so you can get the best of both worlds.


Just for anyone as out of touch with the MacOS ecosystem as me: Accelerate includes a BLAS implementation, so at least seems plausible (depending on how this library was compiled) that their special instructions might have been used.


If it were taking advantage of Accelerate, the performance would be much higher, but also the scaling would be quite different. Look at the scaling in the tweet you linked--it's anything but linear in the number of cores used.


This linear scaling per-core doesn't match my experience with using the AMX co-processor. In my experience, on the M1 and the M1 Pro, there are a limited number of AMX co-processors that is independent of the number of cores within the M1 chip. I wrote an SO answer exploring some of the performance implications of this [0], and since I wrote that another more knowledgable poster has added more information [1]. One of the key takeaways is that appears to be one AMX coprocessor per "complex", leading us to hypothesize that the M1 Pro contains 2 AMX co-processors.

This is supported by taking the code found in the gist [2] linked from my SO answer and running it on my M1 Pro. Compiling it, we get `dgesv_accelerate` which uses Accelerate to solve a medium-size linear algebra problem, that typically takes ~8s to finish on my M1 Pro. While running, `htop` reports that the process is pegging two cores (analogous to the result in my original SO answer on the M1 pegging one core; this supports the idea that the M1 Pro contains two AMX co-processors). If we run two `dgesv_accelerate` processes in parallel, we see that they take ~15 seconds to finish. So there is some small speedup, but it's very small. And if we run four processes in parallel, we see that they take ~32 seconds to finish.

All in all, the kind of linear scaling shown in the article doesn't map well to the limited number of AMX co-processors available in Apple hardware, as we would expect the M1 Max to contain maybe 8 co-processors at most. This means we should see parallelism step up in 8 steps, rather than 20 steps as was shown in the graph.

Everything I just said is true assuming that a single processor, running well-optimized code can completely saturate an AMX co-processor. That is consistent with the tests that I've run, and I'm assuming that the CFD solver he's running is well-written and making good use of the hardware (it does seem to be doing so from the shape of his graphs!). If this were not the case, one could argue that increasing the number of threads could allow multiple threads to more effectively share the underlying AMX coprocessor and we could get the kind of scaling seen in the article. However, in my experiments, I have found that Accelerate very nicely saturates the AMX resources and there is none left over for future sharing (as shown in the dgesv examples).

Finally, as a last note on performance, we have found that using OpenBLAS to run numerical workloads directly on the Performance cores (and not using the AMX instructions at all) is competitive on larger linear algebra workloads. So it's not too crazy to assume that these results are independent of the AMX's abilities!

[0] https://stackoverflow.com/a/67590869/230778 [1] https://stackoverflow.com/a/69459361/230778 [2] https://gist.github.com/staticfloat/2ca67593a92f77b1568c03ea...


But that makes me think, what prevents people from running these calculations on the GPU? Even the memory is shared - the few 100 gflops they get out of the M1 ultra is pocket change in GPU terms.


Apple GPUs don't support FP64 at all - and Apple only provides Metal as a (supported and not deprecated) API.

NVIDIA GPUs have much more ubiquitous programming models, from CUDA C++ to OpenMP (both C++ and Fortran), allowing them to be much more useful.

AMD ROCm (a CUDA clone, but not managed too well...) isn't in a great state today, but it definitely provides an OpenMP compiler, for C++ only though, Fortran is not covered.

Intel GPUs have a much better SW story on all fronts for GPGPU than AMD's, and OpenMP is supported for both C++ and Fortran there.

tldr: if you want to run this code on GPUs, the two vendors to look for are NVIDIA and Intel. The others just don't have the software stack to do so. (without more heavyweight porting)

Of course, optimising for GPUs also adds work. That said, an OpenMP version is a very good starting point.


You can always emulate FP64 - it is not performant to do so, but since CFD is essentially memory-bandwidth-bottlenecked rather than limited by shader performance, this probably is not a big concern. You've got shaders sitting around waiting anyway.

The M1 architecture only allows the CPU to access a fraction of the memory bus, it is around 25% of the total bandwidth or so iirc. The rest has to go to the GPU or NPU or one of the other units, so even if your FP64 code was super shitty emulated with no hardware support, on paper you should be able to quadruple performance by utilizing it.

If you can "pre-digest" it at all, and then push it back from the GPU to the CPU, that might be another viable approach.

Of course none of this will exist out of the box in some 90s FORTRAN code!


> You can always emulate FP64

That's what's done by Intel for their newer consumer GPUs AFAIK, zero FP64 units present on die, but an emulation mode can be enabled to expose that support. (which is then done by software on the GPU)

> it is around 25%

50%, ~ 200GB/sec on an M1 Max out of ~ 400GB/sec


Well that particular NASA code is from the 90’s and in Fortran. Maybe with MPI. That’s probably part of the reason why.


Don't all high performance math libraries have the option of LAPACK interfaces?


Because GPUs calculations are the same type of calculations. If you have a problem where you need to add plus 1 in all of the elements of a matrix, then GPUs are great. Games are a great example because you can take one pixel and calculate its values over time without much worry of what happens around it. Similar story for neural nets.

For CFD it is a different problem. In each time-step you essentially solving a linear system Ax=b, so even if you are looking at element x(100,100,timeperiod2) you need to also know the value of the element x(1,1,timeperiod1).

There are some algorithms that decouple the problem by introducing residuals for each element and then trying to iteratively reduce them, but as you can see, it is not a linear increase in speed as someone would had expected by looking at the GPU specs.

TLDR: Yes you can do CFD with GPUs, but dont expect miracles.


using GPUs for CFD is pretty standard these days and has been for a few years. GPUs are fantastic for solving linear systems.

Nobody uses GPUs for "linear speedups by looking at GPU specs". That's not how accelerators are measured.


"using GPUs for CFD is pretty standard these days and has been for a few years. "

It is not standard in OpenFoam (only though rapidCFD) or Ansys (only this year they made an announcement for full acceleration).

"GPUs are fantastic for solving linear systems."

*Some* linear systems. They are horrible for sparse algebra.


If a CPU doesn't run the OS and software I need to run, it might as well be a GPU or a Cray 9 or something dug out of the wreckage at Area 51. All of which are good descriptions of a CPU available only on the increasingly-proprietary Mac platform.

So this entire thread is kind of pointless with regard to many real-world use cases.


Saying nothing in so many words.

> “If a CPU doesn't run the OS and software I need to run, it might as well be a GPU or …”

> So this entire thread is kind of pointless with regard to many real-world use cases.

Does this kind of reasoning apply to everything that doesn’t suit your unique personal needs?

Personally I think this is an interesting thread, but I don’t use the qualifier of “is it useful to me” to determine whether something is interesting.


I know you're getting downvoted but I feel more or less the same. That's why I am following the Asahi linux folks so closely. I really want them to succeed and for their work to make it possible for Fedora or Ubuntu to be a seamless experience on the M-* Apple hardware.


Alternatively, your OS of choice may as well be a Cray 9 OS if it doesn't run on modern high-end hardware.

Philosophical purity rarely provides top performance. And that's OK, slow and steady works for me, because I'm not doing max performance computing.


Came here to say the same. I don’t know of any computational heavy engineering software available for the Mac. Unfortunately I’m stuck on Windows for all my software.


Well, the Apple Silicon Macs just arrived. If they are a compelling architecture for engineering software, then sooner or later a company will support them. Also, most Unix-based software should be rather easy portable, especially programs for computations which do not depend on a GUI. There is a lot of engineering software which is only available on Unix and these days mostly on Linux, as it comes from times when Windows and PCs where not even remotely a consideration as a platform.


That's good info but omg why a tweet of a crop of screenshot of a document.


> If we rescale (by double!) the vertical axis to fit here, a pretty amazing picture emerges:

That second chart with the "amazing" redline is comparing it to 2010 through 2017 era CPUs. Now compare it to a current generation zen3 based $599 AMD CPU in a $350 motherboard with $450 of RAM.

Or a $599 Intel 12th-gen "core" series.

For $3999 nevermind $7999 you can build one real beast of a workstation that fits in a normal midtower ATX case.


It's funny to read about folks on HN where the audience is likely in the top eschelons of earning percentiles complain about a bespoke vertically integrated solution like the Apple computers. Sure you can build a bespoke, quiet, watercooled or aircooled, system for a fraction of the Mac Studio but that's not the audience the thing was designed to target. Its existence does not prevent one from doing just that, finding parts, and building a pc oneself. This is for folks that need OSX, that like the build quality, and want ARM performance without the hassle. :Shrug:


> It's funny to read about folks on HN where the audience is likely in the top eschelons of earning percentiles complain about a bespoke vertically integrated solution like the Apple computers.

Just because you can afford an absurd $4000-8000 Apple computer doesn't mean you want to waste your money on it. I have a finite amount of money and choose to spend it on other things.

Apple is fine for like, a $1500 laptop. The macbook air is a fine product.

Not for an actual workstation.


Depends on how much you value the exclusive[1] ability to run OSX.

[1] hackintoshes need not apply.


Or you can buy one off the shelf from, say, Puget Systems.

They're not vertically integrated, they're sourcing and assembling the same parts you and I could get (and some we can't: they have graphics cards in stock!)

That's for folks that don't need OSX, that like build quality in the form of sturdy, repairable, maintainable tools rather than glossy but glued-together disposable ones, and want Ryzen or Intel performance that will beat a Mac Studio without the hassle.


The 28-core Xeon W-3275M is similar or even slightly better than either the 5950X or 12900KS on multicore workloads, so it won't change the picture much. Overall, M1 Ultra CPU is comparable to a Zen3 32-core Threadripper if you aggregate over various workloads, with the difference that the Ultra is a more flexible architecture and thus will be more capable in mixed scenarios.

Edit: I'd say that the Ultra is a much better bang for your buck than the Threadripper, unless you absolutely need humongous amounts of RAM or macOS does not work for you.


The other difference is that the M1 Ultra uses less than one quarter the power.

Over a few years of usage, that cost is not insignificant, especially if you are using air conditioning part or all of the year.


So far, Apple is the only vendor that makes really pragmatic hardware with true linear scaling. Cache sizes, RAM, power consumption, compute cluster configurations — everything makes sense and everything is balanced. It's a really refreshing take compared to the mainstream PC industry that was predominantly relying on increasing power consumption to improve performance ein the recent years (AMD being a noteworthy exception, but they are struggling too).

Of course, the argument can be made — and it is not without merit — that power consumption is secondary to performance for a power or professional user. But the low power consumption of M1 series allows Apple to deliver pretty much unprecedented performance for the given size. Ultra offers performance of large workstation tower in more compact form factor than the smallest HTPC. My M1 Max 16" has the performance of a large workstation laptop with the portability and battery life of an ultrabook — and I can use all that performance while working untethered. It's quite interesting to see all these benchmarks where folks show that the latest ALD laptops can marginally outperform the M1 Pro/Max on the desk, under ideal conditions, while in reality the performance will plummet really fast when you actually want to be mobile. In the meantime I enjoy my desktop-level build times while working on my sofa or on the train.


Low power consumption is so important to me, because the best thing about my M1 over my previous x86 MBP is that it doesn't overheat and melt down and freeze up all the time, rendering all that expensive CPU and GPU power and memory and pixels and other hardware completely useless.

I've had to take frozen peas out of the freezer and rub them desperately against the bottom of my x86 MBP to quickly cool it down enough to use, in order to make important Zoom meetings on time. It totally ruins the peas, and can't be good for the computer!

Even if they were both the same speed, it would still be so much better, just because of how cool it runs. How fast and long it runs on batteries is just icing on the cake of not getting fucked by speedstep's "kernel_task % CPU 798.6" all the time.



Sure, but how many watts of power would those beasts need?


As much as it needs. My time is worth a lot more.


You basically just said, "I don't care about the planet or any of the people that live on it, as long as I can get my work done slightly faster".

The power requirements of the machine you describe would easily be more than double that of the Mac, but you probably aren't getting anywhere close to double the performance.


No one who cares about the environment should buy a Mac, where you can't upgrade anything and the entire machine is e-waste when a single component gives up the ghost. I like Macs and the Apple Silicon CPUs but let's be honest here.


No one just throws away a perfectly functioning, fully supported Mac. They hold their value well enough that an “upgrade” is selling your old computer and buying a new one.

Besides, if you care about performance, you’re going to want to do more than just replace the RAM


You completely ignored the part of the sentence where I said "when... a single component gives up the ghost"


No one throws away an entire Mac when a single component fails either. They get it fixed.


>I don't care about the planet or any of the people that live on it, as long as I can get my work done slightly faster

Unless you live in a cave and using your excrement as manure I could apply the same logic to your lifestyle.

Realistically the only practical difference is virtue signaling (I've seen a bunch of hipsters bring up these kind of irrelevant talking points while discussing their vacation in some exotic destination 5 minutes later).

And like someone else said, Apple devices are the definition of throwaway consumer products designed for a limited shelf life.


A difference between 200W and 500W for a single computer in your home isn't going to save the planet.


It's barely going to make a dent in your energy budget, if you use 4 kW/day running a 500W computer for 8 hours (assuming it's pegged at 100% CPU and GPU continuously, which it's not), that's the same as driving 5 miles or as turning down your A/C by 2 degrees.


That is not what he said at all. That's a perfect example of a straw man. Apple has brainwashed you so much that you think you are also saving the planet with its 'amazing' machines. You are not. Collective power saving from computers alone is a drop in the ocean of what all those mega corporations could do if they would just stop pointing the finger of waste management and energy saving on customers.

I wish I had a 5950X that requires less power. I don't. I am going to choose this CPU over any mac any day because it's faster. A lot of other CPUs consume less electricity, I don't use them neither. End of the story.


Maybe the user lives in a place with 100% hydroelectric power.


meanwhile my car's engine is rated for 200kW


I'm gonna hyperventialte to emit more CO2 just to spite you.


Time would be a primary reason to not build your own machine, though.


Who says it is my time? You can just buy custom workstations.


OP did. The original comment was about building your own machine for the cost of a Mac studio.


It's not the same kind of time saving. When I'm working on a particular task having a fast machine lets me stay in the flow with good iteration speed. Realistically I only have X ammount of concentrated effort available and that X is < 8h, building a PC doesn't really bite into that time/energy. But if it's a chore for you - you can just buy prebuilt or even rent bare metal.


No need to assemble. There are services for that.


You need to add the cost of your time into sourcing those parts, assembling it all, troubleshooting issues and then any long term warranty issues that come up.


Just pay Microcenter their $200 fee to do that for you.


My local computer shop did it for $30. It's $60 per hour, and it took them 30 minutes.


To do all the parts research, order them all , build it, troubleshoot and install the OS? Do you get a warranty?


I got a warranty, yes. I live in Australia, so there's a government-enforced 2-year minimum warranty period for durable goods such as electronics.

PS: I once worked on a project where 10 consultants sat around burning money for over a month because the customer couldn't figure out how to spin up a lab environment. We walked down to the local computer shop across the street, ordered the beefiest Xeon workstation tower we could build, packed it with drives, put VMware ESXi on it, and then the work could finally start. The "research, ordering, build, and troubleshooting" took half a day. It saved hundreds of thousands of dollars of lost time and money.


I was running a Lenovo p15v with fedora before changing over to m1 pro MacBook. The hardware is nice, but to be honest it's not much of a speed up over my Lenovo so I'm giving it more time.

Docker is definitely a lot slower

I have read of massive performance improvement with Linux on m1pro though so it might be a swap to that when more distros are available.

Question is, has MacOS become bloated or not had attention to performance to make best use of the new hardware?


Are you running x86 containers or ARM containers? I’m on a 64GB M1 and my x86 container build takes 15 mins from scratch on the M1, 10 mins from scratch on my 2019 Intel MBP, but the ARM container build takes 6 mins on the M1. With or without the Docker pain the battery life is so worth it. Can spend 2 days working from the couch without plugging it in.


Wow...I mean, I love the battery life in my 16" M1 Max Pro. Similar specs to yours...64GB of ram and 8TB ssd...but I've never seen two days of use. Not even really a full day...I can get around 6-8 hours on the couch without plugging in.

Huge, and I mean HUGE improvement over my previous 16" Intel, which seemed to make it around 3 hours before calling it a day...


It really depends on your load. I run Safari, a terminal, VS Code, and I turn off Docker and Slack unless I actively need them. Runaway node processes are the biggest battery offender, so those I kill off if I notice them in Activity Monitor. I find 2/5 brightness to be the most comfortable in my space -- I'm surprised to hear others run max brightness which is eye-scorching! It all depends on ambient light I guess.

Also, 6-8 hours is a full work day of laptop time for me. If you have more hours in the day you probably will get less days out of the laptop than me.


That’s crazy, I tend to make it 6-8ish hours with my current 16” Intel MBP, depending on how hard I’m driving it. I usually have about 200 tabs open (don’t ask) in Safari, a few terminal windows(some ssh’ed in to the server), vpn running, maybe an IDE, and a couple other odds and ends.


I have a similar machine (1tb SSD instead) and I'm also getting about 6hrs although I run my machine on max brightness. If I turn it down to half brightness I might be able to break 8hrs but I have no idea how people are getting ridiculous 10-12hrs on battery.


The battery life really is amazing. This is the sort of thing that really takes a laptop’s UX to the next level, while still being highly performant.

Plus for PC gaming I’ve been using a cloud server via Paperspace and I play games like Cyberpunk with 4K on ultra settings with pretty good latency (thanks to 1.5 gigabit fibre) and even some VR games using Steam VR + Parsec on a Quest 2.

I really don’t see the point in building a PC anymore. And I was really really close to building a Ryzen one before I discovered Paperspace/https://shadow.tech/


2 days? I'm like some of the others posters. I regularly get to around 30% after 4 hours so I would guess I would get 6 to 8 hours on a single charge.


> Docker is definitely a lot slower

Docker on macOS is busted. I have heard that folks have had much better success with alternative implementation such as nerdctl. M1 virtualisation on itself is very fast and has almost no overhead.

> I have read of massive performance improvement with Linux on m1pro though so it might be a swap to that when more distros are available.

I wouldn't hold my breath. Doubt that M1 distros will ever become more than an impressive tech demo.

> Question is, has MacOS become bloated or not had attention to performance to make best use of the new hardware?

MacOS will always be "more bloated" than a basic Linux installation, it's an opinionated, fully features user OS that runs many more services in the background, e.g. code verification, filesystem event database, full-disk indexing etc. But the CPU/GPU performance is generally excellent. Of course, it boils down to the software you are running, if it is a half-assed port (like Docker on Mac) seems to be, it will eat up any advantage the hardware offers.


It's worth at least setting up a Linux VM for Docker. The performance improvement is huge and well worth the hassle. This works today; don't necessarily need to switch to running Linux on the metal. I don't think it's really about macOS, I think Docker Desktop just sucks.


This is quite surprising and interesting to me. Can you provide links where I can read about this and/or try getting it set up for myself?


Multipass is another alternative to running Docker Desktop:

https://multipass.run/docs/docker-tutorial

And I second the other commenter's recommendation of VSCode's Remote Container extension.

https://code.visualstudio.com/docs/remote/containers

And possibly using Podman instead of Docker.

https://opensource.com/article/21/7/vs-code-remote-container...


I don't have any links handy, but I bought a copy of Parallels, set up Ubuntu, and installed docker inside normally. On my host machine, I use the docker client (only the client, no engine) with DOCKER_HOST set to the VM's hostname. VSCode's Remote Container extension is capable of being set up this way, too, so I can attach it to a running container in the VM. It's definitely more work to set up than using Docker Desktop, but I found the performance improvement to be worth it.


Even docker desktop runs the same way i.e they are running docker on a linux VM. It's hidden from the user but that linux VM can be accessed.

For improving performance, you can enable some of the newer experimental features in latest docker (Virtualization Framework and VirtioFS). The combination works really well except for databases [1] due to the way FS sync is handled. To fix that a setting need to be changed in the linux VM that docker uses. [2]. Hopefully docker will make that a default setting in the future.

[1] https://github.com/docker/roadmap/issues/7#issuecomment-1042... [2] https://github.com/docker/roadmap/issues/7#issuecomment-1044...


I can't speak to GP's claims about performance because I never used Docker Desktop, but I use lima [1] with colima [2] as my docker host. It's dead easy to set up: `brew install colima` followed by `colima start`.

1: https://github.com/lima-vm/lima

2: https://github.com/abiosoft/colima


How do you find performance with these options?


Perfectly acceptable! But, I don't have high standards. My reason for using an M1 MBP is mostly the insane battery life.


> Question is, has MacOS become bloated or not had attention to performance to make best use of the new hardware?

Docker is just... faster on Linux. This has been the case for a while, and it's not just kernel-based stuff causing problems: APFS and MacOS' virtualization APIs play a pretty big role in weighing it down.

I'm kinda in the same boat, though. I got a work-issued Macbook that kinda just sits around, most of the time I'll use my desktop or reach for my T460s if I've got the choice. Mostly because I do sysops/devops/Docker stuff, but also because I never really felt like the Mac workflow was all that great. To each their own, I guess.


I don't think it's the virtualization API -- if you run docker in a Linux VM under macOS, it's vastly faster than running it via Docker Desktop. I use a Parallels VM for it. I honestly don't know what Docker Desktop is doing that makes it so much slower, even when you're not mounting anything from the host.


Are you mapping your source code from the host to the vm?


No, but I wasn't doing it in Docker Desktop either. That was the first thing I eliminated as a consideration, since that's an obvious point of slowdown. I assume it would be even slower if I had been doing that. Even a `docker build` which does not mount anything from the host was much slower under Docker Desktop than docker under a custom Parallels VM.

The downside, of course, is that it's a little annoying to keep your code in the VM if you'd otherwise have it on the host. But I still find it worth it because of just how much faster it is. Now that I think about it, I wonder how well it would work to do your own host mounting using either the Parallels guest folder sharing feature or SMB. I never tried it.


Docker is a lot slower on OSX. If you switch to ARM images that helps (especially stability).

A bigger impact is using the virtiofs (beta) file system mapping/sync. The existing one is horrendously slow to the point of being unusable.


Docker on MacOS has always been a problem. On Linux it's essentially free, on MacOS you're running a Linux VM and have overheads of moving data between that and your main OS.

MacOS is somewhat bloated though. There's insane amounts of garbage running in the background.


> on MacOS you're running a Linux VM and have overheads of moving data between that and your main OS

MacOS is a special beast, because if I run a VM myself inside parallels using Docker (Ubuntu server with the docker snap, for example), I get nearly 5x the performance from that VM than I do just the "regular" docker for mac (including when all of the new mac specific experimental options are enabled for better performance).

It's pretty atrocious, I find it completely unusable and just stick to running my containers in Parallels and forwarding the ports.


That's the thing. Docker for Mac comes with atrocious filesystem sharing between the host and the VM, which kills all performance. Your parallels setup don't share FS, thus is much much faster.

I don't get why people still bother with docker for mac...


Docker is double slow on M1 mac compared to Intel linux.

First, you run virtual machine vs just running natively on Linux.

Second, you emulate x86; and, unlike other x86 that are emulated directly on M1, you emulate it in software through QEMU, because M1 cannot virtualize and run x86 at the same time

So it’s just really slow.


It would be interesting to see a more serious effort by others to put together a high performance SOC system for gaming or workstations.

I'm writing this on a cheap core i5 system with intel iris xe graphics. Mediocrity is the name of the game for this type of laptop. Everything is mediocre. The screen, the performance, the touchpad, etc. The only good thing about this system: no blazing fans. That seems to be a thing with most SOC based PCs/laptops. Mediocre performance. Non SOC based solutions exist of course but they suck up a lot more power to the point where laptops start having cooling issues and have to do thermal throttling. I've experienced that with multiple intel based macs. You spend all that money and the system almost immediately starts overheating if you actually bother to use what you paid for.

I actually used to have a wooden plank that I used to insulate my legs from my mac book pro. It was simply too uncomfortable to actually use on my lap.


Theodolite is an awesome app (that I hardly ever need to use).

This guy def knows his math (and working with Apple apps).


Doesnt this just suggest they are leaving some performance on the table then? The reason the Intel processors scale non-linearly is because they run each core faster when there are less cores under load.


Dunno, looks like classic memory bottleneck to me. The m1 ultra has 800GB/sec, but I believe a bit more than half of that is available to the CPUs. The rest is for the GPU and various on chip accelerators.

So with about half the cores (16 vs 28) and twice the bandwidth (say 420GB/sec vs 180 GB/sec) it manages twice the performance. Looks pretty impressive to me. Looks like the Apple is significantly less memory bottle necked than the 6 channel Xeon W.


Only in a certain sense. You can only turn the voltage so high before things melt, and you can only turn the clocks so high until you crash, and then that's basically it, you can't go any faster.

The perfect core could be dialed from fanless to water cooling with linear performance, but it doesn't quite exist today. Intel chips have top end, Apple chips have low end, but an i7 fed 3W isn't going to perform and an M1 can't take any more voltage.

The tension is figuring out how to build very large structures to perform useful work with gobs of power, yet still scale down to low power budgets. Imagine a core that can dynamically morph from P core to E core & back on the fly.


So a new discovery of Amdahl's Law?

https://en.wikipedia.org/wiki/Amdahl%27s_law


I need a new computer. The smart move, especially considering the cost involved, is to go for what I know; a decent enough desktop build with 3 monitors, linux/windows dual boot. What I've used for years.

But it must be said, I've been really tempted by the Macs. I'm not sure why, the 3 main things I do with my personal computer (game dev, playing games, watching things) are things that linux/windows can do at least as good if not better than a Mac, and yet here I am, holding off for months just trying to be convinced into the apple ecosystem.

I think its probably just the simplicity of it. I really like the idea of replacing a load of bulk (3 monitors, the vesa mount to hold them, big keyboard, mouse, bulky tower and a metric tonne of cables) with just a single laptop that I can pickup and go with at a moments notice, though Im not sure of the practicalities of that, at least for me. I don't even travel much, it just seems nice.

Edit: I know that a lot of people have a mac and a desktop to fill all needs, but that somewhat defeats the simplicity of it for me. One bulky computer is simpler for me than one bulky computer for some things and another much smaller computer for others.


But the screen estate isn't even close between three monitors and a laptop. A nice middle ground is a laptop with a dock with monitors, that way you have the external screens when you need them, and can also be on the move, on the same computer.

Of the three main things you do, i think only watching things is comparable between macOS and Windows/Linux. Gaming is nonexistent on macOS, unless you stream ( either a cloud service like Stadia/GeforceNow or locally from a PC with Steam in-house streaming/Parsec), and i can't imagine doing game dev somewhere you can't even test.


> But the screen estate isn't even close between three monitors and a laptop

Absolutely, which is a large part of the reason I'm not sure its practical for me, but its certainly true that one monitor is simpler than three. I suppose I could eventually get used to just having one monitor, it might even have some advantages, but I'm not sure I'd want to.

> A nice middle ground is a laptop with a dock with monitors

This is what I've currently got (laptop is a razer blade). In my case at least it just ended up being more messy and difficult than a desktop. Most docks for example are designed for business use and therefore don't provide "clean" USB power, which I found causes issues with audio interfaces and DACs. I just ended up having a small, poorly cooled, difficult to upgrade desktop.

After that, I decided no more half measures. I either embrace the minimalism of a laptop alone, or return to a desktop. I know a dock works well for some people, sadly not for me.

> Gaming is nonexistent on macOS... i can't imagine doing game dev somewhere you can't even test

This is exactly it (plus game dev for apple sillicon is still somewhat lagging behind), it doesnt seem to be a practical option, yet as I still, here I am.

Of course theres also non-apple laptops as well which would provide similar simplicity. I love my thinkpads, but I'm just not sure they're powerful enough for me to replace a desktop with, and as for the more powerful of the non-apple laptops, the blade has given me a mistrust of them (only had the thing for a year and the battery has bulged beyond being able to fit it in the chassis)


Regarding powerful laptops, I've had only good things about the Asus RoG Zephyrus line


yeh same here, but something about them puts me off a bit. Maybe its something stupid like the looks, or the fact that I never see them used/mentioned in a professional setting (as opposed to Macs or the framework laptops)



That's awesome. Get a law named after yourself by taking an old paper and just saying "nuh-uh, the naive analysis is better".


Congratulations. Announcing the new Lupire’s law.


Are these things good for tensorflow and training 50GB AI models? Or is it better to stick with nvidia?


Can someone please project that curve out and estimate where it starts to flatten?


That's not very useful. That curve might be a combination of a few different bottlenecks, l1 cache misses, scheduling bubbles, efficient scheduling of instructions within a certain window, l2 cache misses, cache contention in L1 or L2, AMX utilization, memory utilization (latency or bandwidth limited), MMU contention/page misses, etc. etc. etc.

The percentage of all those can change as you change the number of cores, and the different levels of the memory hierarchy get different levels of contention, latency limits, or bandwidth limits. So in an ideal world you can draw a graph and extrapolate, but in the real world you might do significant better or worse. There are cases (admittedly rather rare) where performance increases by more than linear in relation to cores.


> There are cases (admittedly rather rare) where performance increases by more than linear in relation to cores.

Ooh that's interesting, do you have any leads or examples?


It's called super linear scaling. If you have a fixed size problem, that doesn't communicate much, and is CPU intensive you can run on a single core and be limited by memory bandwidth or latency. But as you use more cores you can fit in L3, L2, and even L1 as you add more cores.

It's seen sometime even on multinode jobs. But it's definitely the exception to the rule, but it does happen. It's helped by the fact that each level of cache is approximately 10x faster, so fitting in a cache is a huge win. So running in ram on 1 node might be 20x slower than running in cache on 10 nodes.


Could these processor leaps in performance inadvertently help us stop burning up so much coal in the quest for Bitcoin?


No. The economics of Proof of Work mean that increases in compute per watt just lead to an increase in global hash rate. Total energy usage never goes down as long as it's sufficiently profitable to use the energy.


I'm curious why Geekbench haven't put the Mac Studio on their Mac leaderboard yet - https://browser.geekbench.com/mac-benchmarks. There are plenty of benchmarks submitted https://browser.geekbench.com/search?page=7&q=Apple+M1+Ultra...


There’s a bug in the Browser that we haven’t been able to track down yet that’s preventing the Mac Studio from appearing on the leaderboard.


How could a new computer be so different that it can’t be shown? Aren’t the results plain text? This makes no sense


hence the "we haven’t been able to track down yet", I'd imagine


I suspect because one of the benchmarks is browser based, and that's not working.

It's not the display of the benchmarks that isn't working, but the running of them.


[flagged]


Outside the rest of the claim not making sense how exactly would one compare the price of the M1 Ultra to an Intel CPU in the first place? The M1 Ultra doesn't have a price tag and even if it did you'd still not have a number you'd compare to the cost of a CPU.


[flagged]


It is pretty useless to compare the price of a single CPU with the price of an entire PC…


You're definitely right there. I put together a build with this CPU and chose the most expensive part available (except GPU because chip shortage, case because there are $5000 ATX cases for no good reason, PSU because I just got the best Seasonic one, and SSD because there are $12k enterprise ones): https://pcpartpicker.com/list/DYxhk9

So that's $3500 without a GPU, buy a $500 used GPU on eBay and you're beating Apple. And, nobody buys $1000 motherboards, so that takes $500 off. You don't need a $300 case. Etc. Basically the point of the exercise is that you can max everything out, and get a faster computer for less money, which is what the comment was trying to say.

Someone will reply and say that your time sourcing and assembling the components isn't free, or that it doesn't run OS X, etc. I get it, you don't have to say that. Just adding an actual computer that's expensive as possible that you could have right now to compare to.


I just got a Titan A200 with a Ryzen 9 5950X. This CPU is really fast, and dissipates only 105W. The Titan workstation is the quietest I have ever had. It's really incredible. Price tag: $3600.

https://www.titancomputers.com/Titan-A200-AMD-RYZEN-Professi...

According to Passmark, the 5950X is beating Intel's 12900X.

https://www.cpubenchmark.net/compare/Intel-i9-12900K-vs-AMD-...


> According to Passmark, the 5950X is beating Intel's 12900X.

As a 5950X owner I thought this sounded off. Looking at the link it made a lot more sense, it only beats the 12900K in multi-core but in single core it is significantly behind. Really it's behind for the first 8 cores, until the Intel CPU starts using efficiency cores. Compared to the M1 Ultra the 5950X is behind in both metrics at nearly double the power consumption.

Not that the 5950X is a bad CPU, particularly if you need x86 support, it's just not really the topper it used to be against these 1 year newer chips. It'll be interesting if all 3 (AMD, Apple, and Intel) manage to get the next major iterations out by the end of the year and we get a fresh comparison on more even ground.


Fair points. A direct comparison here (https://nanoreview.net/en/cpu-compare/apple-m1-ultra-vs-amd-...) with the M1 Ultra has them pretty much neck-and-neck, not considering power. Considering power, the M1 Ultra is a clear winner.


I think you’re right—we’re still on the first generation of Apple computer chips (and honestly I think the performance is still very impressive, even if overpriced), and there is a lot more motivation now on the others. We might start seeing some big improvements.


This benchmark has the 5950x going up to 194W:

https://www.tomshardware.com/reviews/amd-ryzen-9-5950x-5900x...

Throw in another 40W for the liquid cooler and it adds up.


Though you'd have a possibly noisy large black box, compared to a small quieter silver box. This is a factor in computer design - cooling is hard. A quiet small box with performance of a noisy large black box is something a lot of people would pay for (including me).


But why would you buy the M1 ultra model for single core performance?

I think it would be helpful for people to occasionally restate what they think the thesis is.


I really don't know. I suffer in both directions. I build large projects on a daily basis, so picked a Threadripper. It destroys large builds, especially C++ ones. Then I use the same computer to play games, and the CPU can only spit out 300 frames per second when my monitor can display 360, which is annoying. (It's CPU limited, not GPU limited, sadly.) So really, I want both, and nobody has both.

Single thread performance is going to be especially relevant if you are developing an older language. I'm always surprised how slow webpack is (and don't do enough frontend stuff to mandate that people switch to that Go equivalent), for example. If you have good single thread performance, you make a lot of Typescript developers happy. If you have good multi thread performance, you make a lot of C++ developers and gamers happy. So having both would be great ;)


And are you factoring in the power consumption difference ?

That can definitely add up over the years as I've experienced with my 10980xe.

Also resale value is an important factor. You will struggle to sell that PC in the years ahead which will never be a problem with the Mac.


To save the click a 12900K machine from Dell is about $3k. If you need CUDA or say SolidWorks get the PC, for video and multithreaded workloads the Mac would probably be faster, but really only benchmarks of your use case can tell you.


So made a build to essentially show how expensive an M1 Mac is compared to an intel machine but left out a critical component because it’s too expensive?


apple doesn't sell unbundled chips. Adding a motherboard and RAM would still be less than $4K for most configs.


[flagged]


Umm, running a 1990s fortran code that's a CFD simulation is a "real" workload. Seems relatively likely that any floating point heavy code would act similarly. Hard to say if it's the matrix multiple or the memory bandwidth that's giving apple such a large lead.

Normally I'd discount using a 28 core Intel CPU from 2019, but from what I can tell Intel hasn't improved much since then. Keep in mind that Intel has a specialized vector unit (AVX256 or AVX512 depending on the model), and the listed CPU is pretty high end (with 6 memory channels) where the normal i5/i7/i9 is only 2.

So sure it's not a video compression, gaming, or web browing benchmark, but some folks do run floating point heavy codes. Unlike CUDA, which requires a rewrite, this code wasn't specifically optimized for the M1.


[flagged]


Ah yes, the bot accusation comment. Everyone's favorite subcategory of HN musing.


Hey man, maybe I'm a bot too. Who knows. Blip bloop. But these are just computer chips and fresh low karma accounts with vitriol dialed to the max is a-typical of HN. By all means, M1 is the worst, why froth at the mouth about it in this manner? You explain it.


I don't have to explain anything, HN is based on the ideals of arguing in good faith and not vilifying people for their opinions. If you think they're being harsh, then say that. If you think they're wrong, then refute them. But if you're solely seeking to bash them, why even comment in the first place? How does that advance anyone's understanding of the topic, or sway anyone's belief? It's the opposite of productive conversation, if you think they're encouraging a harmful or counterproductive dialogue then you can downvote them and move on. It's not rocket science.


I specifically wrote that the comment is so harsh it reads like spam/bot. Wasn't sarcastic or trying to bash any humans, sincerely.


Well, I have lower karma than seabriez and I'm neither a bot nor I thought that I had low karma points.


For your own sake I really hope you’re trolling and not actually having your brain in a configuration where your comment makes any sense to you.


Do you have any evidence for this statement ("Apple paid them not to"), or are you just making shit up?


[flagged]


Please don’t turn HN into Slashdot.

I regularly downvote comments that make no points, are solely there to make a gag, and add no substance to the discussion.


It's amusing that you're concerned about someone making a joke turning this into Slashdot, because the actual discussion here comparing specs and cost of Intel vs. Mac could be lifted directly from the Slashdot archives circa the late 90s (adjusting for the specs).


Intel wins on single core performance as well as price though.


Spec us out a full comparable system and show benchmarks.



That's kind of disingenuous.

Searching on Geekbench for Apple M1 ultra single core scores returns values mostly in the 1770-1780 range. E.g. https://browser.geekbench.com/v5/cpu/14597244

Most 12900K score are between 1900 and 2200 but then there is this outlier with single core score of 1252: https://browser.geekbench.com/v5/cpu/14572307

Intel certainly wins on single core, but the m1 Ultra multicore scores are still impressive in comparison being generally 23-24000, while the 12900k are around 15-20000.


So 1900-2200 for Intel and 1770-1780 for M1?

Disingenuous would be to focus on the outlier.


An outlier for m1 ultra was what was reported by parent.


Sure, Intel focuses on single thread perf, high power (241 watt max tdp), and automatically overclocks to 5.1 GHz, only if you have enough power, cooling, and a bunch of idle cores. Thus the 15% variation in submitted scores. It's also rather memory bandwidth constrained, and shows impressive numbers with a single core running.

Apple on the otherhand doesn't overclock, focuses on multi-core performance, has great memory bandwidth, and all the submitted scores are within 1%.

The M1 ultra is also 1.32x faster in the multiprocessing benchmark. Looks pretty impressive to me, even ignoring the much less power the M1 ultra uses.


You can make a system with a 12900K, 32GB of RAM, and a 1TB NVMe SSR for 1150$ : https://pcpartpicker.com/list/m3QjZw


I'll never understand religious loyalty to a corporation.

The computer in the OP is fully assembled and has a 128GB of memory, a nice GPU, and 8TB SSD. Why be so obtuse.

If you're just trying to compare to the entry level Mac Studio at least have the decency to throw in a full parts list, like you know, with a graphics card..


Why do you need a dedicated graphics card for CPU compute? If anything GPU is your concern you're not really interested in an M1 Ultra nor in a 12900K anyways. The CPU itself includes a GPU that is sufficient for anything except GPU heavy workloads, so the parts list is fully complete.

The M1 Ultra version starts at 4000$ dollars with a 1TB hard drive and 64GB memory, but I thought a 32GB option was available, so that's what I went with. By all means add another 32GB for 120$, that's why I sent a link that's fully configurable.

As for a parts list, I sent one because the request was to spec out. If you actually go look for a fully assembled machine you are likely to find one with a similar price.

If you want to add an RTX3070, 256GB of RAM, and an 8TB SSD, you can go ahead, it will still be over a thousand dollars cheaper than the corresponding Mac Studio. Again, that's why I sent the link, so you can play around with it.

Lastly, I don't see how this is any kind of loyalty to a corporation. It's simply pointing out the truth that the computer in question is just very expensive if you care about performance. As far as I'm concerned, it's trying to deny the obvious that the M1 Ultra is slower in workloads that can't use 16 cores efficiently at multiple times the price that looks like religious loyalty to a corporation to me. Especially when the corporation in question does what it can to lock you in, which just isn't the case for AMD or Intel.

If you really need 4 more cores and at the same time do not care about single core performance enough to go with a 12900k, but at the same time care about it enough not to go with a threadripper, to such a degree that you can justify spending 2000$ more, and don't care about GPU performance either, then sure the Mac Studio is for you. Otherwise, it only makes sense if you love macOS at thousands of dollars worth (or are locked into the ecosystem). There's nothing wrong about acknowledging it's an extremely niche product that is almost never justifiable on performance grounds.


I think it is important to compare comparable things.

> If you want to add an RTX3070, 256GB of RAM, and an 8TB SSD, you can go ahead, it will still be over a thousand dollars cheaper than the corresponding Mac Studio.

If you want something comparable to a Mac Studio you would need to add those things. I'm glad you managed to agree to that. A thousand dollar delta on a $4000 computer is nothing to sneeze at, however yours is a parts list and a Mac Studio is a complete product.

> Lastly, I don't see how this is any kind of loyalty to a corporation.

It read as if you were carrying water for Intel.

> It's simply pointing out the truth that the computer in question is just very expensive if you care about performance.

People care about all kinds of things. Obviously if you have no software to run that would benefit from the combination of hardware the Mac Studio offers you would be better off with something else. Horse for courses.

If you enjoy spending time assembling hardware that's great too.

> There's nothing wrong about acknowledging it's an extremely niche product that is almost never justifiable on performance grounds.

Sure.


>If you want something comparable to a Mac Studio you would need to add those things. I'm glad you managed to agree to that. A thousand dollar delta on a $4000 computer is nothing to sneeze at, however yours is a parts list and a Mac Studio is a complete product.

There is no corresponding product at that price. The 4000$ Mac Studio 7TB less RAM, half of the memory, and a fraction of the GPU performance. The Mac you're describing costs 8000$ dollars, and were comparing it to a 2500-3000$ dollar computer.

It's not a 1000$ dollar delta on a 4000$ computer, it's a 2000+$ delta on a 4000$ computer and a 5000$ delta on an 8000$ computer.

> It read as if you were carrying water for Intel.

I honestly can't see how that came across. Someone asked for the spec out of a 12900K computer because comparing the price of a CPU to that if a full computer is unfair, so I gave it. I personally wouldn't buy a 12900K myself, it's ridiculously overkill and on everything except the highest of single core performance Intel is currently a worse proposition than AMD.

> People care about all kinds of things. Obviously if you have no software to run that would benefit from the combination of hardware the Mac Studio offers you would be better off with something else. Horse for courses.

Sure, but the point I'm trying to make is that there is a tiny, tiny, tiny amount of use-cases where there is a genuine performance advantage. The original assertion that the 12900K (or in half of the use cases, a Threadripper) is superior in performance is generally accurate for the 80% of people that run single threaded or lightly threaded workloads on the CPU, and for the 10% of people that only care about multi threaded workloads a Threadripper is better. For the 10% of people that run a particular mix of both, half of them have the GPU of the M1 Ultra as a deal-breaker. Obviously it's all dependent on the particular use case, it's just that those are much rarer than most think on the performance front. There are other valid reasons besides performance of course.

> If you enjoy spending time assembling hardware that's great too.

You don't have to - pre-assembled computers are generally cheaper than building them yourself these days. The reason I have given a parts list is because that's the fairest way to compare various configurations without listing off 20 different SKUs.


Intel has integrated graphics. If you only care about CPU perf you don't need more for a GPU.


That seems like a poorly thought out theory considering the $500 12900K needs 50 watts to beat the 2 year old M1 in single core performance. Whether you're a laptop user or an enterprise server customer you care about efficiency.


where did you come up with the 12900K pulling 50W? The 12900K is a 240W-class processor at full boost, so the M1 is pulling 1/6th the power...

(the 12700K is much more reasonable but the 12900K and 12900KS are a "win the benchmarks" SKU and intel turned everything to 11 to get it over the top of the 5900X.)


50W is the single core package power at peak boost (~5.1GHz). M1 single core package power is around 4-7W depending on workload.


For a workload that is optimized on M1 for 10 more watts you get way faster and better functionality than a 2 yo M1 that uses 40 watts and cant even export h264 format video faster than a 5 year old computer.


Editorialized title. Original was "2022 Mac Studio (20-core M1 Ultra) Review".


Changed now. (Submitted title was 'Near-linear speedup for CPU compute on 20-core Mac Studio'.) Thanks!


I didn’t submit it but in this case the original title is so generic no one would have looked at it so I’m kind of happy they put the important part in the headline here.


email these in


Could you explain what “these” are that need to be emailed in?


Title corrections. @dang can't read every thread to see these posts, but if you email him, he can take a look.


Makes sense. Thank you for clarifying.


Done! Thank you for the reminder :)


I bet you can build an AMD system that beats this handily and costs half as much.


And the power consumption?


Is an irrelevant metric, like battery life, when doing scientific computation.


Performance per watt is a pretty big metric.

Personally the fact they didn’t produce Linux drivers for these things is baffling. It would just make more people buy them. So what if they don’t run macOS? I would buy a M1 air tomorrow if they published Linux drivers for the full stack.


I have to think they’re developer- and supply-constrained. They’ll make up those sales once others have cracked the problem.


The richest tech company in the world worth trillions is constrained?


There are always constraints. Usually it's the time and attention of the people who are working on the problem. Throwing more people at the problem can help, but also adds a lot of coordination overhead to the problem. It doesn't matter how many resources you have, hard problems are still hard.


Weird, but true.


I don't believe it, honestly. I would gamble that it's because helping Linux grow doesn't financially benefit them, now or in the future. And that, to me at least, feels quite a bit more likely from Apple.


> Personally the fact they didn’t produce Linux drivers for these things is baffling.

Followed by

> helping Linux grow doesn't financially benefit them, now or in the future

Is a bit perplexing. Why are you baffled?

Opportunity cost is a very real thing. Apple would probably be better served by developing Windows drivers, if they were going to take developers away from macOS drivers.


I didn't write that I'm baffled. That was a different poster.


Oops, sorry, the shared 'f' first character must have thrown me.


You'd lose the bet.

Apple, like every company, is limited by the speed at which their supply chain can produce specific parts. And, while they're often the largest customer for a given manufacturer, they're rarely the only.


There's literally people working on it right now, for free, that they could pay and give access to all of the docs for the hardware that they designed in house.


it's a perennial topic of discussion in the AMD vs Intel debate, so a lot of people do seem to care about it.

And why wouldn't you, when it all becomes heat in your room? I'm not saying every single watt matters but, in cases where you're putting the M1 ultra at 40W against a 280W threadripper or Epyc... yeah that hugely matters.


That's debate. That's just what you and me are doing right now. Useless internet fluff. When push comes to shove, and you really wanna shove something, you're not going to pick the most efficient, you're going to pick the best pusher available. And right now that's gonna be something by AMD or Intel with an (probably several) nvidia GPUs.

Or you offload it to GCP/AWS but then the wattage of the M1 Ultra is still irrelevant because you could use a Chromebook to do that.


Many people don't care about power consumption precisely because it becomes heat in the room. They live in places where the need for heating is bigger than the need for air conditioning.

In my experience, a ~500 W gaming PC does not generate any noticeable heat unless you use it in a small room behind a closed door. A 1450 W vacuum cleaner does, so I guess heat becomes significant somewhere around 1 kW of sustained power.


As someone who does a fair amount of scientific computation in the field, power consumption as it relates to battery life is not an irrelevant metric.


Do you actually care about power consumption in a desktop computer under very heavy workload?

By the way, the M1 Ultra has a 370W power supply, so really the question is do you really care about the difference between 300 and 400 watts in this use case?


> Do you actually care about power consumption in a desktop computer under very heavy workload?

yes, many people do, especially since that becomes heat in your room.

> By the way, the M1 Ultra has a 370W power supply, so really the question is do you really care about the difference between 300 and 400 watts in this use case?

Complete non-sequitur, you could put a 1.5kw power supply on an R7 5700G or an i3, that doesn't mean it pulls 1.5kw.

you also should be able to express this without using the flamebait style, that is not appropriate or welcomed on this site. Instead of "by the way...." you can simply say "the M1 Ultra pulls 370W so ...." (or you could, if that were true).


There is no easy way to know how much the Mac Studio can pull without using a wall meter.

We know that reported power consumption is inaccurate on M1 devices, from Anadtech's testing of the M1 Max, and I doubt it's very different on the Mac Studio.

In any case, the difference is not going to be much.

As far as heating a room, really, I can assure you that you will barely notice 100W. There are screens that will use more power than that (such as, amongst others, the XDR screen). I say this as someone who lived with a 900W power hog of a computer in places where temperature hits 42 in shade.


The power supply is rate that high because it is expected to supply power to several USB-C ports.

It looks like the max power consumption of the Mac Studio is 215w. https://support.apple.com/en-us/HT213100


I don't know about the Mac Studio, but the Macbook Pro can exceed its stated power consumption limit. USB-C is a very plausible way, but I doubt it has over 100W of USB-C PD budget, and PSU included is 370W continuous, so it still has headroom for devices.


but I doubt it has over 100W of USB-C PD budget

The Studio Ultra has six Thunderbolt 4 ports. Each port is required to deliver at least 15W by the TB4 standard, so that’s already 90W. Add some budget for the USB-A ports and you are near 100W for power delivery.

(Probably not much over that though.)


Er, machines in the M1 Ultra class do not have 400 watt power supplies. The 2019 Mac pro that it's compared to (with 28 cores) has a 1.4Kw power supply.

A 12900K has a max TDP (for the chip, not including ram) of 210 watts, and loses on geekbench5 to the M1 ultra by 1/3rd or so. To get closer you'll need a i9 or threadripper and systems with either of those typically need quite large power supplies and still have a small fraction of the memory bandwidth.


Can you? how much will you charge to build one for me? Do you offer warranty support?


Not OP, but sure why not. Just buy, say, a $4000 workstation with support from one of the usual vendors and sell it to you for $4500.

Btw most of that price is for certification for industry software. Does the Mac have that? If not, it is just immediately not an option.


Not with 800GB/sec of memory bandwidth.

Or even the 440GB/sec of memory bandwidth available to the CPUs.

Sure if you are cache friendly enough and you get enough Zen3 or Intel cores you can win, but you end up spending a fair chunk of change, getting less memory bandwidth, and for a clear win you often need to spend more, like say getting a Lenovo Threadripper (and they have a exclusive rights to the chip for 6 months or something).


> "Nano-texture glass gives up a little bit of the sharp vibrant look you get with a glossy screen, but it’s worth the trade in usability, to be able to see the screen without distractions all day long."

"Nano-texture glass" is pretty much just what all screens were like back in the days of CRTs and pre-glossy flat screens. Now Apple are charging $300 for it!


It is a marketing name, but refers to a special procedure used to create the matte surface, not the fact that it’s matte. By the way, CRTs were glossy. We wiped them with wet towels.


> By the way, CRTs were glossy. We wiped them with wet towels.

I remember the crackling, if you wiped them, within a few minutes of last being on.

Nowadays, there's lots of folks that think "CRT" is a political hotbutton topic.


> Nowadays, there's lots of folks that think "CRT" is a political hotbutton topic.

Well, who wants big government shoving their noses into how efficient our screens are, and whether they have leaded glass or not. :P


Most CRTs weren't glossy in the same way that modern flat panels are. They diffused reflections: you couldn't see a crystal-clear mirror image of yourself on a dark screen like you can with today's screens.


They did, they were just molded convex lead glass. Maybe the convex part mattered?


Some CRT's also had an anti-reflective coating. My old Eizo certainly did.


"Nano-texture" is different from matte - matte LCDs don't have reflective glare, but they also have much lower contrast and you can see the grain if you look closely. Nanotexture doesn't have those issues, but it's expensive.


that was my thought as well ("yay marketing") but apparently it's actually structurally different so that it maintains contrast. I ordered one in early march and if it ever actually arrives I'll try to remember to reply on visible difference :D (not kidding, it was due late march, then slipped to Aripl 22-27, and today moved to May 23-?? )


I have the nano textured XDR Pro and chose to order the Studio Display without it. I’m a data point of 1 obviously, but it diffuses the image quite a bit in my opinion.


oh huh? :-/ thanks for the heads up


the pixels are WAY smaller nowadays, though; perhaps that constrains the nanotexture fabrication process in an expensive way?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: