Since no one specifically answered your question yet, yes, you should be able to...

lynguist · 2025-03-05T23:07:48 1741216068

I actually think it’s not a coincidence and they specifically built this M3 Ultra for DeepSeek R1 4-bit. They also highlight in their press release that they tested it with 600B class LLMs (DeepSeek R1 without referring to it by name). And they specifically did not stop at 256 GB RAM to make this happen. Maybe I’m reading too much into it.

tgma · 2025-03-06T06:02:47 1741240967

Pretty sure this has absolutely nothing to do with Deepseek and even local LLM at large, which has been a thing for a while and an obvious use case original Llama leak and llama.cpp coming around.

Fact is Mac Pros in the Intel days supported 1.5TB RAM in some configurations[1] and that was 6 years ago expectations of their high end customer base. They needed to address the gap for those customers so they would have shipped such a product regardless. Local LLM is cherry-on-top. Deepseek in particular almost certainly had nothing to do with it. They will still need to double their supported RAM in their SoC to get there. Perhaps in a Mac Pro or a different quad-Max-glued chip.

[1]: https://support.apple.com/en-us/101639

saagarjha · 2025-03-06T07:15:29 1741245329

The thing that people are excited about here is unified memory that the GPU can address. Mac Pro had discrete GPUs with their own memory.

tgma · 2025-03-06T16:24:24 1741278264

I understand why they are excited about it—just pointing out it is a happy coincidence. They would have and should have made such a product to address the need of RAM users alone, not VRAM in particular, before they have a credible case to cut macOS releases on Intel.

water9 · 2025-03-06T07:38:59 1741246739

Intel integrated graphics, technically also used unified memory with the standard dram

kergonath · 2025-03-06T09:05:50 1741251950

Those also have terrible performance and worse bandwidth. I am not sure they are really relevant, to be honest.

McDaveNZ · 2025-03-06T08:18:37 1741249117

Did the Xeons in the Mac Pro even have integrated graphics?

icedchai · 2025-03-06T23:38:47 1741304327

So did the Amiga, almost 40 years ago...

vaxman · 2025-03-07T01:51:42 1741312302

You mean this? ;) http://de.wikipedia.org/wiki/Datei:Amiga_1000_PAL.jpg

RIP Jay Miner who watched his unified memory daughters Agnus, Denise and Paula be slowly murdered by Jack Tramiel's vengeance against Irving Gould. [Why couldn't the shareholders have stormed their boardroom 180 days before the company ran out of cash, installed interim management who, in turn, would have brought back the megalomaniac Founder that would, until his dying breath, keep spreading their cash to the super brilliant geniuses that made all the magic chips happen and then turn the resulting empire over to ops people to make their workplace so uncomfortable they all retire early and live happily ever after on tropical islands and snowy mountain tops?]

icedchai · 2025-03-07T03:26:39 1741317999

Yep! Though one could argue the Amiga wasn't true unified memory due to the chip RAM limitations. Depending on the Agnus revision, you'd be limited to 512, 1 meg, or 2 meg max of RAM addressable by the custom chips ("chip RAM".)

vaxman · 2025-03-12T08:45:40 1741769140

fun fact: M-series that are configured to use more than 75% of shared memory for GPU can make the system go boom...something to do with assumptions macOS makes that can be fixed by someone with a "private key" to access kernel mode (maybe not a hardware limit).

icedchai · 2025-03-15T20:43:52 1742071432

I messed around with that setting on one of my Macs. I wanted to load a large LLM model and it needed more than 75% of shared memory.

kmacdough · 2025-03-06T02:17:07 1741227427

That or it's the luckiest coincidence! In all seriousness, Apple is fairly consistent about not pushing specs that don't matter and >256GB is just unnecessary for most other common workloads. Factors like memory bandwidth, core count and consumption/heat would have higher impact.

That said, I doubt it was explicitly for R1, but rather based the industry a few years ago when GPT 3s 170B was SOTA, but the industry was still looking larger. "As much memory as possible" is the name of the game for AI in a way that's not true for other workloads. It may not be true for AI forever either.

icedchai · 2025-03-06T23:43:46 1741304626

The high end Intel Macs supported over a TB of RAM, over 5 years ago. It's kinda crazy Apple's own high end chips didn't support more RAM. Also, the LLM use case isn't new... Though DeepSeek itself may be. RAM requirements always go up.

teknologist · 2025-03-08T05:27:00 1741411620

Just to clarify. There is an important difference between unified memory, meaning accessible by both CPU and GPU, and regular RAM that is only accessible by CPU.

angoragoats · 2025-03-10T20:50:51 1741639851

As mentioned elsewhere in this thread, unified memory has existed long before Apple released the M1 CPU, and in fact many Intel processors that Apple used before supported it (though the Mac pros that supported 1.5TB of RAM did not, as they did not have integrated graphics).

The presence of unified memory does not necessarily make a system better. It’s a trade off: the M-series systems have high memory bandwidth thanks to the large number of memory channels, and the integrated GPUs are faster than most others. But you can’t swap in a faster GPU, and when using large LLMs even a Mac Studio is quite slow compared to using discrete GPUs.

brookst · 2025-03-06T12:26:17 1741263977

Design work on the Ultra would have started 2-3 years ago, and specs for memory at least 18 months ago. I’m not sure they had that kind of inside knowledge for what Deepseek specifically was doing that far in advance. Did Deepseek even know that long ago?

happyopossum · 2025-03-06T16:30:10 1741278610

> they specifically built this M3 Ultra for DeepSeek R1 4-bit

Which came out in what, mid January? Yeah, there's no chance Apple (or anyone) has built a new chip in the last 45 days.

tempaccount420 · 2025-03-10T16:02:19 1741622539

Don't they build these Macs just-in-time? The bandwidth doesn't change with the RAM, so surely it couldn't have been that hard to just... use higher capacity RAM modules?

vaxman · 2025-03-07T01:18:48 1741310328

"No chance?" But it has been reported that the next generation of Apple Silicon started production a few weeks ago. Those deliveries may enable Apple to release its remaining M3 Ultra SKUs for sale to the public (because it has something Better for its internal PCC build-out).

It also may point to other devices ᯅ depending upon such new Apple Silicon arriving sooner, rather than later. (Hey, I should start a YouTube channel or religion or something. /s)

SV_BubbleTime · 2025-03-06T17:27:32 1741282052

No one is saying they built a new chip.

But the decision to come to market with a 512GB sku may have changed from not making sense to “people will buy this”.

cyanydeez · 2025-03-06T23:33:27 1741304007

Dies are designed in years.

This was just a coincidence.

SV_BubbleTime · 2025-03-07T16:01:19 1741363279

What part of “no one is saying they designed a new chip” is lost here?

cyanydeez · 2025-03-07T19:03:26 1741374206

Sorry, non of us a fan boys trying to shape apple is great narratives

forrestthewoods · 2025-03-05T23:52:54 1741218774

I don’t think you understand hardware timelines if you think this product had literally anything to do with anything DeepSeek.

reitzensteinm · 2025-03-06T05:23:55 1741238635

Chip? Yes. Product? Not necessarily...

It's not completely out of the question that the 512gb version of M3 Ultra was built for their internal Apple silicon servers powering Private Compute Cloud, but not intended for consumer release, until a compelling use case suddenly arrived.

I don't _think_ this is what happened, but I wouldn't go as far as to call it impossible.

forrestthewoods · 2025-03-06T06:07:12 1741241232

DeepSeek R1 came out Jan 20.

Literally impossible.

reitzensteinm · 2025-03-06T06:32:13 1741242733

The scenario is that the 512gb M3 Ultra was validated for the Mac Studio, and in volume production for their servers, but a business decision was made to not offer more than a 256gb SKU for Mac Studio.

I don't think this happened, but it's absolutely not "literally impossible". Engineering takes time, artificial segmentation can be changed much more quickly.

forrestthewoods · 2025-03-06T07:54:21 1741247661

From “internal only” to “delivered to customers” in 6 weeks is literally impossible.

ryao · 2025-03-06T14:33:19 1741271599

This change is mostly just using higher density ICs on the assembly line and printing different box art with a SKU change. It does not take much time, especially if they had planned it as a possible product just in case management changed its mind.

jahewson · 2025-03-06T09:15:49 1741252549

That's absurd. Fabing custom silicon is not something anybody does for a few thousand internal servers. The unit economics simply don't work. Plus Apple is using OpenAI to provide its larger models anyway, so the need never even existed.

brookst · 2025-03-06T12:31:50 1741264310

Apple is positively building custom servers, and quantities are closer to the 100k range than 1000 [0]

But I agree they are not using m3 ultra for that. It wouldn’t make any sense.

0. https://www.theregister.com/AMP/2024/06/11/apple_built_ai_cl...

teknologist · 2025-03-08T03:02:18 1741402938

That could be why they're also selling it as the Mac Studio M3 Ultra

bustling-noose · 2025-03-06T02:14:13 1741227253

My thoughts too. This product was in the pipeline maybe 2-3 years ago. Maybe with LLMs getting popular a year ago they tried to fit more memory but it’s almost impossible to do that that close to a launch. Especially when memory is fused not just a module you can swap.

tgma · 2025-03-06T06:13:33 1741241613

Your conclusion is correct but to be clear the memory is not "fused." It's soldered close to the main processor. Not even a Package-on-Package (two story) configuration.

See photo without heatspreader here: https://wccftech.com/apple-m2-ultra-soc-delidded-package-siz...

bustling-noose · 2025-03-08T05:06:47 1741410407

I think by fuse I mean't its stuck on to the SOC module, not part of the SOC as I may have worded. While you could maybe still add NANDs later in the manufacturing process, it's probably not easy, especially if you need more NANDs and a larger module which might cause more design problems. The NAND is closer cause the controller is in the SOC. So the memory controller probably would also change with higher memory sizes which would mean this cannot be a last minute change.

fennecfoxy · 2025-03-10T11:49:07 1741607347

Sheesh, the...comments on that link.

nightski · 2025-03-06T04:00:12 1741233612

$10k to run a 4 bit quantized model. Ouch.

OriginalMrPink · 2025-03-06T13:03:42 1741266222

That's today. What about tomorrow?

water9 · 2025-03-06T07:40:27 1741246827

The M4 MacBook Pro 128GB can run a 32B perimeter model with an 8 bit quantized model just fine

vaxman · 2025-03-06T06:40:37 1741243237

[flagged]

titanomachy · 2025-03-06T18:42:14 1741286534

I'm downvoting you because your use of language is so annoying, not because I work for Apple.

vaxman · 2025-03-07T01:22:11 1741310531

So, Microsoft?

fredoliveira · 2025-03-06T15:19:20 1741274360

what?

vaxman · 2025-03-06T19:39:18 1741289958

Sorry, an apostrophe got lost in "PO's"

vaxman · 2025-03-06T08:56:57 1741251417

[flagged]

1R053 · 2025-03-06T09:39:21 1741253961

are you comparing the same models? How did you calculate the TOPS for M3 Ultra?

vaxman · 2025-03-06T19:10:19 1741288219

An M3 Ultra is two M3 Max chips connected via fabric, so physics.

Did not mean to shit on anyone's parade, but it's a trap for novices, with the caveat that you reportedly can't buy a GB10 until "May 2025" and the expectation that it will be severely supply constrained. For some (overfunded startups running on AI monkey code? Youtube Influencers?), that timing is an unacceptable risk, so I do expect these things to fly off the shelves and then hit eBay this Summer.

jrflowers · 2025-03-06T21:48:14 1741297694

> they specifically built this M3 Ultra for DeepSeek R1 4-bit.

This makes sense. They started gluing M* chips together to make Mac Studios three years ago, which must have been in anticipation of DeepSeek R1 4-bit

a1o · 2025-03-05T23:53:07 1741218787

Any ideas on power consumption? I wonder how much power would that use. It looks like it would be more efficient than everything else that currently exists.

j45 · 2025-03-06T02:18:37 1741227517

Looks like up to 480W listed here

https://www.apple.com/mac-studio/specs/

a1o · 2025-03-06T16:27:24 1741278444

Thanks!!

ryao · 2025-03-06T01:56:51 1741226211

The M2 Ultra Mac Pro could reach a maximum of 330W according to Apple:

https://support.apple.com/en-us/102839

I assume it is similar.

drited · 2025-03-05T20:32:12 1741206732

I would be curious about context window size that would be expected when generating ballpark 20 to 20 tokens per second using Deepseek-R1 Q4 on this hardware?