MacBook Pro M2 with 64GB of RAM. That's why I tend to be limited to Ollama and M...

jychang · 2025-04-21T07:44:28 1745221468

MLX is slower than GGUFs on Macs.

On my M1 Max macbook pro, the GGUF version bartowski/google_gemma-3-27b-it-qat-GGUF is 15.6gb and runs at 17tok/sec, whereas mlx-community/gemma-3-27b-it-qat-4bit is 16.8gb and runs at 15tok/sec. Note that both of these are the new QAT 4bit quants.

phaedrix · 2025-04-21T15:12:59 1745248379

No, in general mlx versions are always faster, ice tested most of them.

85392_school · 2025-04-21T15:44:02 1745250242

What TPS difference are you getting?

Elucalidavah · 2025-04-20T17:01:19 1745168479

> MacBook Pro M2 with 64GB of RAM

Are there non-mac options with similar capabilities?

simonw · 2025-04-20T17:13:43 1745169223

Yes, but I don't really know anything about those. https://www.reddit.com/r/LocalLLaMA/ is full of people running models on PCs with NVIDIA cards.

The unique benefit of an Apple Silicon Mac at the moment is that the 64GB of RAM is available to both the GPU and the CPU at once. With other hardware you usually need dedicated separate VRAM for the GPU.

_neil · 2025-04-20T18:55:00 1745175300

It’s not out yet, but the upcoming Framework desktop [0] is supposed to have a similar unified memory setup.

[0] https://frame.work/desktop

dwood_dev · 2025-04-20T20:49:58 1745182198

Anything with the Radeon 8060S/Ryzen AI Max+ 395. One of the popular MiniPC Chinese brands has them for preorder[0] with shipping starting May 7th. Framework also has them, but shipping Q3.

0: https://www.gmktec.com/products/prepaid-deposit-amd-ryzen™-a...

chpatrick · 2025-04-21T01:22:35 1745198555

I've never been able to get ROCm working reliably personally.

danans · 2025-04-20T23:34:07 1745192047

Nvidia Orin AGX if a desktop form factor works for you.

chpatrick · 2025-04-21T01:21:56 1745198516

I remember seeing a post about someone running the full size DeepSeek model in a dual-Xeon server with a ton of RAM.