Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Previous model of M2 Ultra had max memory of 192GB. Or 128GB for Pro and some other M3 model, which I think is plenty for even 99.9% of professional task.

They now bump it to 512GB. Along with insane price tag of $9499 for 512GB Mac Studio. I am pretty sure this is some AI Gold rush.



Every single AI shop on the planet is trying to figure out if there is enough compute or not to make this a reasonable AI path. If the answer is yes, that 10k is a absolute bargain.


No, because there is no CUDA. We have fast and cheap alternatives to NVIDIA, but they do not have CUDA. This is why NVIDIA has 90% margins on its hardware.


CUDA is simply not important for modern vLLM and many many others. DeepSeek V3 works great on SGLang. https://www.amd.com/en/developer/resources/technical-article...

Can you do absolutely everything? No. But most models will run or retrain fine now without CUDA. This premise keeps getting recycled from the past, even as that past has grown ever more distant.


CUDA is becoming more critical, not less, every day. Software developed around CUDA is vastly outpacing what other companies produce. And saving a few millions when creating new models doesn't matter; NVIDIA is pretty efficient at scale.

I don't know if you've heard, but NVIDIA is about to add a monthly payment for additional CUDA features and I'm almost certain that many big companies will be happy to pay for them.

> But most models will run or retrain fine now without CUDA.

This is correct for some small startups, not big companies.


CUDA is incredibly important still. It's still an incredible amount of work to get packages working on multiple GPU paradigms, and by default everyone still starts with CUDA.

The example I always give is FFT libraries - if you compare cuFFT to rocFFT. rocFFT only just released support for distributed transforms in December 2024, something you've been able to do since CUDA Toolkit v8.0, released in 2017. It's like this across the whole AMD toolkit, they're so far behind CUDA it's kind of laughable.


> that 10k is a absolute bargain

The higher end NVidia workstation boxes won’t run well on normal 20amp plugs. So you need to move them to a computer room (whoops, ripped those out already) or spend months getting dedicated circuits run to office spaces.


Didn't really think about this before, but that seems to be mainly an issue in Northern / Central America and Japan. In Germany, for example, typical household plugs are 16A at 230V.


In the US, normal circuits aren't always 20A, especially in residential buildings, where they are more commonly 15A in bedrooms and offices.

https://en.wikipedia.org/wiki/NEMA_connector


While technically true, the NEMA 5-15R receptacles are rated for use on 20A circuits, and circuits for receptacles are almost always 20A circuits, in modern construction at least. Older builds may not be, of course.

That said, if your load is going to be a continuous load drawing 80% of the rated amperage, it really should be a NEMA 5-20 plug and receptacle, the one where one of the prongs is horizontal instead of vertical. Swapping out the receptacle for one that accepts a NEMA 5-20P plug is like $5.

If you are going to actually run such a load on a 20A circuit with multiple receptacles, you will want to make sure you're not plugging anything substantial into any of the other receptacles on that circuit. A couple LED lights are fine. A microwave or kettle, not so much.


> and circuits for receptacles are almost always 20A circuits, in modern construction at least.

This is not true. Standard builds (a majority) still use 15-amp circuits where 20-amp is not required by NEC.


Yes, almost all post 2000 houses will have 20 amps in kitchen but in many areas the other circuits will be 15 amp.


To clarify, the circuit is almost always 20A with 15A being used for lighting. However, the outlet itself is almost always 15A because you put multiple outlets on a single circuit. You are going to see very few 20A in outlets (which have a T shaped prong) in residential.


To clarify further, "20A" circuit just means a 20A breaker and suitable wire (12 AWG or larger).


I would check your breaker box as well. If a hair dryer trips anything then… well yeah probably older construction.


Is this actually true? Were people doing this with the 192gb of the M2 Ultra?

I'm curious to learn how AI shops are actually doing model development if anyone has experience there. What I imagined was: Its all in the "cloud" (or, their own infra), and the local machine doesn't matter. If it did matter, the nvidia software stack is too important, especially given that a 512gb M3 Ultra config costs $10,000+.


You’re largely correct for training models

Where this hardware shines is inference (aka developing products on top of the models themselves)


True. But with Project Digits supposedly around the corner, which supposedly costs $3,000 and supports ConnectX and runs Blackwell; what's the over-under on just buying two of those at about half the price of one maxed M3 Ultra Mac Studio?


And how much VRAM will Project Digits have?


128gb each, so two would have 256gb.

Its half that of a max spec Mac Studio, but also half the price and eight times faster memory speed. Realistically which open source LLMs does 512gb over 256gb of memory unlock? My understanding is that the true bleeding edge ones like R1 won't even handle 512gb well, especially with the anemic memory speed.


We really should see what happens when Project Digits is finally released. Also, I would love in NVIDIA decided to get in the CPU/GPU + unified memory space.

I can't imagine the M3 Ultra doing well on a model that loads into ~500G, but they should be a blast on 70b models (well, twice as fast as my M3 Max at least) or even a heavily quantized 400b model.


I agree project digits looks to be the better all-around option for AI researchers, but I still think the Mac is better for people building products with AI

Re memory speed, digits will be at 273GB/s while the Mac Studio is at 819GB/s

Not to mention the Mac has 6 120GB/s thunderbolt 5 ports and can easily be used for video editing, app development, etc.


No AI shop is buying macs to use as a server. Apple should really release some server macOS distribution, maybe even rackable M-series chips. I believe they have one internally.


Why would any business pay Apple Tax for a backend, server product?


Not much to figure out. It's 2x M4 Max, so you need 100 of these to match the TOPS of even a single consumer card like the RTX 5090.


Sure, but if you have models like DeepSeek - 400GB - that won't fit on a consumer card.


True. But an AI shop doesn't care about that. They get more performance for the money by going for multiple Nvidia GPUs. I have 512 GB ram on my PC too with 8 memory channels, but it's not like it's usable for AI workloads. It's nice to have large amounts of RAM, but increasing the batch size during training isn't going to help when compute is the bottleneck.


It's 2x M3 Max


> It's 2x M4 Max

Not exactly though.

This can have 512GB unified memory, 2x M4 Max can only have 128GB total (64GB each).


Now do VRAM


LLMs easily use a lot of RAM, and these systems are MUCH, MUCH cheaper (though slower) than a GPU setup with the equivalent RAM.

A 4-bit quantization of Llama-3.1 405b, for example, should fit nicely.


The question will be how it will perform. I suspect Deepseek, Llama405B demonstrated the need for larger memory. Right now folks could build an epyc system with that much ram or more to run Deepseek at about 6 tokens/sec for a fraction of that cost. However not everyone is a tinker, so there's a market for this for those that don't want to be bothered. You say "AI Gold rush" like it's a bad thing, it's not.


Remember, that RAM is also VRAM, so 1/2 terabyte of VRAM ain’t cheap. By comparison, Apple is a downright bargain!


It doesn't have the bandwidth of dedicated GPU VRAM.


Yes it does. It is just short of a 4090's memory bandwidth.

It's still far away from an H100 though.


Wouldn't multiple rtx gpus also have more bandwidth


Big question is: Does the $10k price already reflect Trump's tariffs on China? Or will the price rise further still..


Maybe .1% of tasks need this RAM, why are they charging so much?


I don't need 512GB of RAM but the moment I do I'm certain I'll have bigger things to worry about than a $10K price tag.


This is Pascal's wager written in terms of ... RAM. The original didn't make sense and neither does this iteration.


I would still wait until I need it before buying it…


Because the minority that needs that much RAM can't work without it.

In the media composing world they use huge orchestral templates with hundreds and hundreds of tracks with millions of samples loaded into memory.


Because the .1% is who will buy it? I mean, yeah, supply and demand. High demand in a niche with no supply currently means large margins.

I don't think anyone commercially offers nearly this much unified memory or NPU/GPUs with anything near 512GB of memory.


Maybe because .1% of tasks need this RAM, it attracts a .1% price tag


With all things semiconductor, low volume = higher cost (and margin).

The people who need the crazy resource can tie it to some need that costs more. You’d spend like $10k running a machine with similar capabilities in AWS in a month.


It enables the use of giant AI models on a personal computer. Might not run too fast though. But at least it's possible at all.


What is stopping us from running these models on a PC with 512GB RAM?


You have a point; technically they aren't impossible to run if you have enough system RAM (or hell, SSD/HDD space for that mater). But in practice neither running on the CPU, nor on the GPU by constantly paging data in and out of VRAM, is a very attractive option (~10x slowdown at least).


So the only reason the mac is faster is because the RAM is accessible by its GPU, right? Not because the RAM is faster than regular RAM, because AFAIK it isn't far off from workstation RAM speeds.


The RAM is faster. 8 DDR5 64GB sticks at 8800 MT/s would in theory give you a maximum of 563.2 GB/s. Whereas the M3 Ultra is 819 GB/s.


Can anyone even get 8 sticks to run together at 8800?


I'd say "No". IIRC the current number is somewhere 5-6K


The narrower the niche, the more you can charge.


I think the answer is because they can ( there is a market for it ). The benefit to a crazy person like me that with this addition, I might be able to grab 128gb version at a lower price.


because they know there will be a large amount of people that don't know this much ram but they'll buy it anyway.


because that's how much its worth


Its not though. For consumer computers somewhere 1k-4k there's nothing better. But for the price of 512gb of RAM you could buy that + a crazy CPU + 2x 5090s by building your own. The market fit is "needs power; needs/wants macOS; has no budget" which is so incredibly niche. But in terms of raw compute output there's absolutely no chance this is providing bang for buck


2x 5090s would only give you 64GB of memory to work with re:LLM workloads, which is what people are talking about in this thread. The 512GB of system RAM you’re referring to would not be useful in this context. Apple’s unified memory architecture is the part you’re missing.


How much VRAM do you get on those 2x 5090s?

How much would it cost to get up to 512gb?


Do you understand that it's UNIFIED RAM, so it doubles as vRAM? I would love to know what computer you can build for <10k with 0.5TB of VRAM.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: