Previous model of M2 Ultra had max memory of 192GB. Or 128GB for Pro and some other M3 model, which I think is plenty for even 99.9% of professional task.
They now bump it to 512GB. Along with insane price tag of $9499 for 512GB Mac Studio. I am pretty sure this is some AI Gold rush.
Every single AI shop on the planet is trying to figure out if there is enough compute or not to make this a reasonable AI path. If the answer is yes, that 10k is a absolute bargain.
No, because there is no CUDA. We have fast and cheap alternatives to NVIDIA, but they do not have CUDA. This is why NVIDIA has 90% margins on its hardware.
Can you do absolutely everything? No. But most models will run or retrain fine now without CUDA. This premise keeps getting recycled from the past, even as that past has grown ever more distant.
CUDA is becoming more critical, not less, every day. Software developed around CUDA is vastly outpacing what other companies produce. And saving a few millions when creating new models doesn't matter; NVIDIA is pretty efficient at scale.
I don't know if you've heard, but NVIDIA is about to add a monthly payment for additional CUDA features and I'm almost certain that many big companies will be happy to pay for them.
> But most models will run or retrain fine now without CUDA.
This is correct for some small startups, not big companies.
CUDA is incredibly important still. It's still an incredible amount of work to get packages working on multiple GPU paradigms, and by default everyone still starts with CUDA.
The example I always give is FFT libraries - if you compare cuFFT to rocFFT. rocFFT only just released support for distributed transforms in December 2024, something you've been able to do since CUDA Toolkit v8.0, released in 2017. It's like this across the whole AMD toolkit, they're so far behind CUDA it's kind of laughable.
The higher end NVidia workstation boxes won’t run well on normal 20amp plugs. So you need to move them to a computer room (whoops, ripped those out already) or spend months getting dedicated circuits run to office spaces.
Didn't really think about this before, but that seems to be mainly an issue in Northern / Central America and Japan. In Germany, for example, typical household plugs are 16A at 230V.
While technically true, the NEMA 5-15R receptacles are rated for use on 20A circuits, and circuits for receptacles are almost always 20A circuits, in modern construction at least. Older builds may not be, of course.
That said, if your load is going to be a continuous load drawing 80% of the rated amperage, it really should be a NEMA 5-20 plug and receptacle, the one where one of the prongs is horizontal instead of vertical. Swapping out the receptacle for one that accepts a NEMA 5-20P plug is like $5.
If you are going to actually run such a load on a 20A circuit with multiple receptacles, you will want to make sure you're not plugging anything substantial into any of the other receptacles on that circuit. A couple LED lights are fine. A microwave or kettle, not so much.
To clarify, the circuit is almost always 20A with 15A being used for lighting. However, the outlet itself is almost always 15A because you put multiple outlets on a single circuit. You are going to see very few 20A in outlets (which have a T shaped prong) in residential.
Is this actually true? Were people doing this with the 192gb of the M2 Ultra?
I'm curious to learn how AI shops are actually doing model development if anyone has experience there. What I imagined was: Its all in the "cloud" (or, their own infra), and the local machine doesn't matter. If it did matter, the nvidia software stack is too important, especially given that a 512gb M3 Ultra config costs $10,000+.
True. But with Project Digits supposedly around the corner, which supposedly costs $3,000 and supports ConnectX and runs Blackwell; what's the over-under on just buying two of those at about half the price of one maxed M3 Ultra Mac Studio?
Its half that of a max spec Mac Studio, but also half the price and eight times faster memory speed. Realistically which open source LLMs does 512gb over 256gb of memory unlock? My understanding is that the true bleeding edge ones like R1 won't even handle 512gb well, especially with the anemic memory speed.
We really should see what happens when Project Digits is finally released. Also, I would love in NVIDIA decided to get in the CPU/GPU + unified memory space.
I can't imagine the M3 Ultra doing well on a model that loads into ~500G, but they should be a blast on 70b models (well, twice as fast as my M3 Max at least) or even a heavily quantized 400b model.
I agree project digits looks to be the better all-around option for AI researchers, but I still think the Mac is better for people building products with AI
Re memory speed, digits will be at 273GB/s while the Mac Studio is at 819GB/s
Not to mention the Mac has 6 120GB/s thunderbolt 5 ports and can easily be used for video editing, app development, etc.
No AI shop is buying macs to use as a server. Apple should really release some server macOS distribution, maybe even rackable M-series chips. I believe they have one internally.
True. But an AI shop doesn't care about that. They get more performance for the money by going for multiple Nvidia GPUs. I have 512 GB ram on my PC too with 8 memory channels, but it's not like it's usable for AI workloads. It's nice to have large amounts of RAM, but increasing the batch size during training isn't going to help when compute is the bottleneck.
The question will be how it will perform. I suspect Deepseek, Llama405B demonstrated the need for larger memory. Right now folks could build an epyc system with that much ram or more to run Deepseek at about 6 tokens/sec for a fraction of that cost. However not everyone is a tinker, so there's a market for this for those that don't want to be bothered. You say "AI Gold rush" like it's a bad thing, it's not.
With all things semiconductor, low volume = higher cost (and margin).
The people who need the crazy resource can tie it to some need that costs more. You’d spend like $10k running a machine with similar capabilities in AWS in a month.
You have a point; technically they aren't impossible to run if you have enough system RAM (or hell, SSD/HDD space for that mater). But in practice neither running on the CPU, nor on the GPU by constantly paging data in and out of VRAM, is a very attractive option (~10x slowdown at least).
So the only reason the mac is faster is because the RAM is accessible by its GPU, right? Not because the RAM is faster than regular RAM, because AFAIK it isn't far off from workstation RAM speeds.
I think the answer is because they can ( there is a market for it ). The benefit to a crazy person like me that with this addition, I might be able to grab 128gb version at a lower price.
Its not though. For consumer computers somewhere 1k-4k there's nothing better. But for the price of 512gb of RAM you could buy that + a crazy CPU + 2x 5090s by building your own. The market fit is "needs power; needs/wants macOS; has no budget" which is so incredibly niche. But in terms of raw compute output there's absolutely no chance this is providing bang for buck
2x 5090s would only give you 64GB of memory to work with re:LLM workloads, which is what people are talking about in this thread. The 512GB of system RAM you’re referring to would not be useful in this context. Apple’s unified memory architecture is the part you’re missing.
They now bump it to 512GB. Along with insane price tag of $9499 for 512GB Mac Studio. I am pretty sure this is some AI Gold rush.