Previous model of M2 Ultra had max memory of 192GB. Or 128GB for Pro and some ot...

InTheArena · 2025-03-05T14:35:16 1741185316

Every single AI shop on the planet is trying to figure out if there is enough compute or not to make this a reasonable AI path. If the answer is yes, that 10k is a absolute bargain.

ZeroTalent · 2025-03-05T21:40:00 1741210800

No, because there is no CUDA. We have fast and cheap alternatives to NVIDIA, but they do not have CUDA. This is why NVIDIA has 90% margins on its hardware.

jauntywundrkind · 2025-03-06T00:07:23 1741219643

CUDA is simply not important for modern vLLM and many many others. DeepSeek V3 works great on SGLang. https://www.amd.com/en/developer/resources/technical-article...

Can you do absolutely everything? No. But most models will run or retrain fine now without CUDA. This premise keeps getting recycled from the past, even as that past has grown ever more distant.

ZeroTalent · 2025-03-06T22:30:11 1741300211

CUDA is becoming more critical, not less, every day. Software developed around CUDA is vastly outpacing what other companies produce. And saving a few millions when creating new models doesn't matter; NVIDIA is pretty efficient at scale.

I don't know if you've heard, but NVIDIA is about to add a monthly payment for additional CUDA features and I'm almost certain that many big companies will be happy to pay for them.

> But most models will run or retrain fine now without CUDA.

This is correct for some small startups, not big companies.

physicsguy · 2025-03-06T10:53:15 1741258395

CUDA is incredibly important still. It's still an incredible amount of work to get packages working on multiple GPU paradigms, and by default everyone still starts with CUDA.

The example I always give is FFT libraries - if you compare cuFFT to rocFFT. rocFFT only just released support for distributed transforms in December 2024, something you've been able to do since CUDA Toolkit v8.0, released in 2017. It's like this across the whole AMD toolkit, they're so far behind CUDA it's kind of laughable.

Spooky23 · 2025-03-05T15:17:22 1741187842

> that 10k is a absolute bargain

The higher end NVidia workstation boxes won’t run well on normal 20amp plugs. So you need to move them to a computer room (whoops, ripped those out already) or spend months getting dedicated circuits run to office spaces.

magnetometer · 2025-03-05T15:58:34 1741190314

Didn't really think about this before, but that seems to be mainly an issue in Northern / Central America and Japan. In Germany, for example, typical household plugs are 16A at 230V.

someothherguyy · 2025-03-05T18:14:30 1741198470

In the US, normal circuits aren't always 20A, especially in residential buildings, where they are more commonly 15A in bedrooms and offices.

https://en.wikipedia.org/wiki/NEMA_connector

theturtle32 · 2025-03-05T20:58:26 1741208306

While technically true, the NEMA 5-15R receptacles are rated for use on 20A circuits, and circuits for receptacles are almost always 20A circuits, in modern construction at least. Older builds may not be, of course.

That said, if your load is going to be a continuous load drawing 80% of the rated amperage, it really should be a NEMA 5-20 plug and receptacle, the one where one of the prongs is horizontal instead of vertical. Swapping out the receptacle for one that accepts a NEMA 5-20P plug is like $5.

If you are going to actually run such a load on a 20A circuit with multiple receptacles, you will want to make sure you're not plugging anything substantial into any of the other receptacles on that circuit. A couple LED lights are fine. A microwave or kettle, not so much.

andrewmcwatters · 2025-03-05T21:55:05 1741211705

> and circuits for receptacles are almost always 20A circuits, in modern construction at least.

This is not true. Standard builds (a majority) still use 15-amp circuits where 20-amp is not required by NEC.

bliteben · 2025-03-06T18:37:54 1741286274

Yes, almost all post 2000 houses will have 20 amps in kitchen but in many areas the other circuits will be 15 amp.

hervature · 2025-03-05T20:06:26 1741205186

To clarify, the circuit is almost always 20A with 15A being used for lighting. However, the outlet itself is almost always 15A because you put multiple outlets on a single circuit. You are going to see very few 20A in outlets (which have a T shaped prong) in residential.

badc0ffee · 2025-03-06T19:50:24 1741290624

To clarify further, "20A" circuit just means a 20A breaker and suitable wire (12 AWG or larger).

Glide · 2025-03-06T14:34:32 1741271672

I would check your breaker box as well. If a hair dryer trips anything then… well yeah probably older construction.

827a · 2025-03-05T15:09:55 1741187395

Is this actually true? Were people doing this with the 192gb of the M2 Ultra?

I'm curious to learn how AI shops are actually doing model development if anyone has experience there. What I imagined was: Its all in the "cloud" (or, their own infra), and the local machine doesn't matter. If it did matter, the nvidia software stack is too important, especially given that a 512gb M3 Ultra config costs $10,000+.

DrBenCarson · 2025-03-05T15:47:53 1741189673

You’re largely correct for training models

Where this hardware shines is inference (aka developing products on top of the models themselves)

827a · 2025-03-05T16:13:19 1741191199

True. But with Project Digits supposedly around the corner, which supposedly costs $3,000 and supports ConnectX and runs Blackwell; what's the over-under on just buying two of those at about half the price of one maxed M3 Ultra Mac Studio?

DrBenCarson · 2025-03-05T20:23:24 1741206204

And how much VRAM will Project Digits have?

827a · 2025-03-06T04:20:51 1741234851

128gb each, so two would have 256gb.

Its half that of a max spec Mac Studio, but also half the price and eight times faster memory speed. Realistically which open source LLMs does 512gb over 256gb of memory unlock? My understanding is that the true bleeding edge ones like R1 won't even handle 512gb well, especially with the anemic memory speed.

seanmcdirmid · 2025-03-06T04:23:26 1741235006

We really should see what happens when Project Digits is finally released. Also, I would love in NVIDIA decided to get in the CPU/GPU + unified memory space.

I can't imagine the M3 Ultra doing well on a model that loads into ~500G, but they should be a blast on 70b models (well, twice as fast as my M3 Max at least) or even a heavily quantized 400b model.

DrBenCarson · 2025-03-06T11:50:27 1741261827

I agree project digits looks to be the better all-around option for AI researchers, but I still think the Mac is better for people building products with AI

Re memory speed, digits will be at 273GB/s while the Mac Studio is at 819GB/s

Not to mention the Mac has 6 120GB/s thunderbolt 5 ports and can easily be used for video editing, app development, etc.

internetter · 2025-03-05T15:10:53 1741187453

No AI shop is buying macs to use as a server. Apple should really release some server macOS distribution, maybe even rackable M-series chips. I believe they have one internally.

jerjerjer · 2025-03-05T19:34:31 1741203271

Why would any business pay Apple Tax for a backend, server product?

NorwegianDude · 2025-03-05T15:26:53 1741188413

Not much to figure out. It's 2x M4 Max, so you need 100 of these to match the TOPS of even a single consumer card like the RTX 5090.

jeffhuys · 2025-03-05T15:32:55 1741188775

Sure, but if you have models like DeepSeek - 400GB - that won't fit on a consumer card.

NorwegianDude · 2025-03-05T16:17:46 1741191466

True. But an AI shop doesn't care about that. They get more performance for the money by going for multiple Nvidia GPUs. I have 512 GB ram on my PC too with 8 memory channels, but it's not like it's usable for AI workloads. It's nice to have large amounts of RAM, but increasing the batch size during training isn't going to help when compute is the bottleneck.

wpm · 2025-03-05T17:39:58 1741196398

It's 2x M3 Max

alberth · 2025-03-05T18:33:48 1741199628

> It's 2x M4 Max

Not exactly though.

This can have 512GB unified memory, 2x M4 Max can only have 128GB total (64GB each).

DrBenCarson · 2025-03-05T15:46:02 1741189562

Now do VRAM

HPsquared · 2025-03-05T14:55:03 1741186503

LLMs easily use a lot of RAM, and these systems are MUCH, MUCH cheaper (though slower) than a GPU setup with the equivalent RAM.

A 4-bit quantization of Llama-3.1 405b, for example, should fit nicely.

segmondy · 2025-03-05T15:49:14 1741189754

The question will be how it will perform. I suspect Deepseek, Llama405B demonstrated the need for larger memory. Right now folks could build an epyc system with that much ram or more to run Deepseek at about 6 tokens/sec for a fraction of that cost. However not everyone is a tinker, so there's a market for this for those that don't want to be bothered. You say "AI Gold rush" like it's a bad thing, it's not.

MR4D · 2025-03-06T05:55:30 1741240530

Remember, that RAM is also VRAM, so 1/2 terabyte of VRAM ain’t cheap. By comparison, Apple is a downright bargain!

tobyhinloopen · 2025-03-06T09:17:04 1741252624

It doesn't have the bandwidth of dedicated GPU VRAM.

leetharris · 2025-03-06T17:53:22 1741283602

Yes it does. It is just short of a 4090's memory bandwidth.

It's still far away from an H100 though.

heeen2 · 2025-03-08T21:04:29 1741467869

Wouldn't multiple rtx gpus also have more bandwidth

bloppe · 2025-03-05T16:59:03 1741193943

Big question is: Does the $10k price already reflect Trump's tariffs on China? Or will the price rise further still..

dwighttk · 2025-03-05T14:29:17 1741184957

Maybe .1% of tasks need this RAM, why are they charging so much?

cjbgkagh · 2025-03-05T14:46:10 1741185970

I don't need 512GB of RAM but the moment I do I'm certain I'll have bigger things to worry about than a $10K price tag.

almostgotcaught · 2025-03-05T21:55:11 1741211711

This is Pascal's wager written in terms of ... RAM. The original didn't make sense and neither does this iteration.

cjbgkagh · 2025-03-05T22:44:56 1741214696

I would still wait until I need it before buying it…

pier25 · 2025-03-05T14:34:43 1741185283

Because the minority that needs that much RAM can't work without it.

In the media composing world they use huge orchestral templates with hundreds and hundreds of tracks with millions of samples loaded into memory.

agloe_dreams · 2025-03-05T14:50:04 1741186204

Because the .1% is who will buy it? I mean, yeah, supply and demand. High demand in a niche with no supply currently means large margins.

I don't think anyone commercially offers nearly this much unified memory or NPU/GPUs with anything near 512GB of memory.

madeofpalk · 2025-03-05T14:53:55 1741186435

Maybe because .1% of tasks need this RAM, it attracts a .1% price tag

Spooky23 · 2025-03-05T15:25:09 1741188309

With all things semiconductor, low volume = higher cost (and margin).

The people who need the crazy resource can tie it to some need that costs more. You’d spend like $10k running a machine with similar capabilities in AWS in a month.

Sharlin · 2025-03-05T16:01:26 1741190486

It enables the use of giant AI models on a personal computer. Might not run too fast though. But at least it's possible at all.

tobyhinloopen · 2025-03-06T09:17:37 1741252657

What is stopping us from running these models on a PC with 512GB RAM?

Sharlin · 2025-03-06T12:27:35 1741264055

You have a point; technically they aren't impossible to run if you have enough system RAM (or hell, SSD/HDD space for that mater). But in practice neither running on the CPU, nor on the GPU by constantly paging data in and out of VRAM, is a very attractive option (~10x slowdown at least).

tobyhinloopen · 2025-03-06T16:56:16 1741280176

So the only reason the mac is faster is because the RAM is accessible by its GPU, right? Not because the RAM is faster than regular RAM, because AFAIK it isn't far off from workstation RAM speeds.

badc0ffee · 2025-03-06T20:02:01 1741291321

The RAM is faster. 8 DDR5 64GB sticks at 8800 MT/s would in theory give you a maximum of 563.2 GB/s. Whereas the M3 Ultra is 819 GB/s.

instagib · 2025-03-07T02:59:49 1741316389

Can anyone even get 8 sticks to run together at 8800?

tobyhinloopen · 2025-03-07T06:43:43 1741329823

I'd say "No". IIRC the current number is somewhere 5-6K

regularfry · 2025-03-05T14:33:30 1741185210

The narrower the niche, the more you can charge.

A4ET8a8uTh0_v2 · 2025-03-05T14:34:48 1741185288

I think the answer is because they can ( there is a market for it ). The benefit to a crazy person like me that with this addition, I might be able to grab 128gb version at a lower price.

znpy · 2025-03-05T14:34:59 1741185299

because they know there will be a large amount of people that don't know this much ram but they'll buy it anyway.

rewtraw · 2025-03-05T14:34:34 1741185274

because that's how much its worth

internetter · 2025-03-05T15:14:25 1741187665

Its not though. For consumer computers somewhere 1k-4k there's nothing better. But for the price of 512gb of RAM you could buy that + a crazy CPU + 2x 5090s by building your own. The market fit is "needs power; needs/wants macOS; has no budget" which is so incredibly niche. But in terms of raw compute output there's absolutely no chance this is providing bang for buck

kjreact · 2025-03-05T15:48:32 1741189712

2x 5090s would only give you 64GB of memory to work with re:LLM workloads, which is what people are talking about in this thread. The 512GB of system RAM you’re referring to would not be useful in this context. Apple’s unified memory architecture is the part you’re missing.

DrBenCarson · 2025-03-05T15:49:27 1741189767

How much VRAM do you get on those 2x 5090s?

How much would it cost to get up to 512gb?

jeffhuys · 2025-03-05T15:34:21 1741188861

Do you understand that it's UNIFIED RAM, so it doubles as vRAM? I would love to know what computer you can build for <10k with 0.5TB of VRAM.