> very few libraries are adapted to apple silicone let alone optimized
This is a joke, right? Have you been anywhere in the LLM ecosystem for the past year or so? I'm constantly hearing about new ways in which ASi outperforms traditional platforms, and new projects that are optimized for ASi. Such as, for instance, llama.cpp.
The memory bandwidth of the M2 Ultra is around 800GB/s verses 1008 GB/s for the 4090. While it’s true the M2 has neither the bandwidth or the GPU power, it is not limited to 24G of VRAM per card. The 192G upper limit on the M2 Ultra will have a much easier time running inference on a 70+ billion parameter model, if that is your aim.
Besides size, heat, fan noise, and not having to build it yourself, this is the only area where Apple Silicon might have advantage over a homemade 4090 rig.
It doesn't beat RTX 4090 when it comes to actual LLM inference speed. I bought a Mac Studio for local inference because it was the most convenient way to get something fast enough and with enough RAM to run even 155b models. It's great for that, but ultimately it's not magic - NVidia hardware still offers more FLOPS and faster RAM.
> It doesn't beat RTX 4090 when it comes to actual LLM inference speed
Sure, whisper.cpp is not an LLM. The 4090 can't even do inference at all on anything over 24GB, while ASi can chug through it even if slightly slower.
I wonder if with https://github.com/tinygrad/open-gpu-kernel-modules (the 4090 P2P patches) it might become a lot faster to split a too-large model across multiple 4090s and still outperform ASi (at least until someone at Apple does an MLX LLM).
PSA for all people who are still being misled by hand-wavy Apple M1 marketing charts[1] implicating total dominance of M-series wondersilicon obsoleting all Intel/NVIDIA PCs:
There are benchmark data showing that an Apple M2 Ultra is 47% and 60% slower against Xeon W9 and RTX 4090, or 0.35% and 2% slower against i9-13900K and RTX 4060 Ti, respectively, in Geekbench 5 Multi-threaded and OpenCL Compute tests.
Apple Silicon Macs are NOT faster than competing desktop computers, nor M1 was massively faster than NVIDIA 3070(Desktop - 2x faster than Laptop variant M1 was compared against) for that matter. They just offer up to 128GB shared RAM/VRAM options in slim desktops and laptops, which is handy for LLM, that's it.
Please stop taking Apple marketing materials at full face value or above. Thank you.
> The 4090 can't even do inference at all on anything over 24GB, while ASi can chug through it even if slightly slower.
Common LLM runners can split model layers between VRAM and system RAM; a PC rig with a 4090 can do inference on models larger than 24G.
Where the crossover point where having the whole thing on Apple Silicon unified memory vs. doing split layers on a PC with a 4090 and system RAM is, I don't know, but its definitely not “more than 24G and a 4090 doesn't do anything”.
> Common LLM runners can split model layers between VRAM and system RAM; a PC rig with a 4090 can do inference on models larger than 24G.
Sure and ASi can do inference on models larger than the Unified Memory if you account for streaming the weights from the SSD on-demand. That doesn't mean it's going to be as fast as keeping the whole thing in RAM, although ASi SSDs are probably not particularly bad as far as SSDs go.
Slightly slower in this case is like 10x. I have M3 Max with 128GB RAM, 4090 trashes it on anything under 24GB, then M3 Max trashes it on anything above 24GB, but it's like 10x slower at it than 4090 on <24GB.
Yeah. Let me just walk down to Best Buy and get myself a GPU with over 24 gigabytes of VRAM (impossible) for less than $3,000 (even more impossible). Then tell me ASi is nothing compared to Nvidia.
Even the A100 for something around $15,000 (edit: used to say $10,000) only goes up to 80 gigabytes of VRAM, but a 192GB Mac Studio goes for under $6,000.
Those figures alone proves Nvidia isn't even competing in the consumer or even the enthusiast space anymore. They know you'll buy their hardware if you really need it, so they aggressively segment the market with VRAM restrictions.
Oops, I remembered it being somewhere near $15k but Google got confused and showed me results for the 40GB instead so I put $10k by mistake. Thanks for the correction.
A100 80GB goes for around $14,000 - $20,000 on eBay and A100 40GB goes for around $4,000 - $6,000. New (not from eBay - from PNY and such), it looks like an 80GB would set you back $18,000 to $26,000 depending on whether you want HBM2 or HBM2e.
Meanwhile you can buy a Mac Studio today without going through a distributor and they're under $6,000 if the only thing you care about is having 192GB of Unified Memory.
And while the memory bandwidth isn't quite as high as the 4090, the M-series chips can run certain models faster anyway, if Apple is to be believed
This is a joke, right? Have you been anywhere in the LLM ecosystem for the past year or so? I'm constantly hearing about new ways in which ASi outperforms traditional platforms, and new projects that are optimized for ASi. Such as, for instance, llama.cpp.