What hardware are you able to run this on?

Terretta · 2024-11-28T16:18:52 1732810732

If your job or hobby in any way likes LLMs, and you like to "Work Anywhere", it's hard not to justify the MBP Max (e.g. M3 Max, now M4 Max) with 128GB. You can run more than you'd think, faster than you'd think.

See also Hugging Face's MLX community:

https://huggingface.co/mlx-community

QwQ 32B is featured:

https://huggingface.co/collections/mlx-community/qwq-32b-pre...

If you want a traditional GUI, LM Studio beta 0.3.x is iterating on MLX: https://lmstudio.ai/beta-releases

Y_Y · 2024-11-29T09:18:52 1732871932

For that price you could get some beefy Nvidia GPUs or a lot of cloud credits, but the unified memory and nice laptop are a real bonus.

I've been off Mac's for ten years since OSX started driving me crazy, but I've been strongly considering picking up the latest Mac Mini as a poor man's version of what you're talking about. For €1k you can get an M4 with 32GiB of unified ram, of an M4 pro with 64GiB for €2k which is a bit more affordable.

If you shucked the cheap ones into your rack you could have a very hefty little Beowulf cluster for the price of that MBP.

msk-lywenn · 2024-11-29T08:39:36 1732869576

4699$US. Quickest justification I ever made not to buy something

jstummbillig · 2024-11-29T17:18:29 1732900709

To add to that, given the wild trajectory of the field, it's at the very least doubtful, that that's going to buy you access to hardware (let alone: a model) that's still even remotely desirable for relevant AI use, even a year from now.

tourmalinetaco · 2024-11-30T07:49:43 1732952983

But it has the logo I’ve been told to buy, so regardless of the quality or specs I need to buy it, how else will I ever show my face in Zoom meetings to my SoCal colleagues?

fennecfoxy · 2024-12-05T11:20:33 1733397633

Yeah, exactly. If you don't want a discrete GPU of your own in a desktop that you could technically access remotely, then go with cloud GPU.

Why pay Apple silly money for their ram when you could take that same money, get a MB Air and build a desktop with a 4090 in it (hell, if you already have the desktop you could buy TWO 4090s for that money). Then just set up the server to use remotely.

Terretta · 2024-12-08T17:16:38 1733678198

> why

offline. anywhere.

separately, if you purchase anything from Dell to Framework, you quickly learn there's no "apple tax", there's a cost for a type of engineering to a given spec (for oranges to oranges "TCO" you should spec all the things, including screen, energy, and resale value), and it's comparable from any manufacture, with the Apple resale value dropping their TCO lower than the others.

fennecfoxy · 2024-12-09T14:11:32 1733753492

Yeah I suppose if you wanna hike up a mountain & still be able to use a full sized model then, yeah. But be my guest, I ain't doing that any time soon!

naming_the_user · 2024-11-28T04:45:02 1732769102

Works well for me on an MBP with 36GB ram with no swapping (just).

I've been asking it to perform relatively complex integrals and it either manages them (with step by step instructions) or is very close with small errors that can be rectified by following the steps manually.

simonw · 2024-11-28T04:17:23 1732767443

M2 MacBook Pro with 64GB of RAM.

mark_l_watson · 2024-11-28T12:44:57 1732797897

I am running it on a 32G memory mac mini with an M2 Pro using Ollama. It runs fine, faster than I expected. The way it explains plans for solving problems, then proceeding step by step is impressive.

j0hnyl · 2024-11-28T16:15:28 1732810528

How many tokens per second?

andrewmunsell · 2024-11-29T02:20:04 1732846804

Another data point:

17.6 tokens/s on an M4 Max 40 core GPU

mark_l_watson · 2024-11-29T00:57:02 1732841822

I am away from my computer, but I think it was about 10/second - not too bad.

behnamoh · 2024-11-29T05:04:22 1732856662

8.4 tps on M1 Pro chip with 32GM RAM (Q4 model, 18GB).

torginus · 2024-11-28T06:19:06 1732774746

Sorry for the random question, I wonder if you know, what's the status of running LLMs non-NVIDIA GPUs nowadays? Are they viable?

mldbk · 2024-11-28T15:27:13 1732807633

I run llama on 7900XT 20GB, works just fine.

danielbln · 2024-11-28T06:50:18 1732776618

Apple silicon is pretty damn viable.

throwaway314155 · 2024-11-28T07:29:38 1732778978

Pretty sure they meant AMD

torginus · 2024-11-28T08:52:05 1732783925

Yeah, but if you buy ones with enough RAM, you're not really saving money compared to NVIDIA, and you're likely behind in perf.

anon373839 · 2024-11-28T22:47:35 1732834055

Nvidia won’t sell these quantities of RAM at Apple’s pricing. An A100 80GB is $14k, while an M3 Max MBP with 96GB of RAM can be had for $2.7k.

paxys · 2024-11-30T22:23:42 1733005422

96GB of unified RAM. How much of that is available to the graphics cores? I haven't tested a later model but the M1 Max would max out at 16GB VRAM regardless of how much the machine had.

There's a reason companies are setting up clusters of A100s, not MacBooks.

fennecfoxy · 2024-12-05T11:47:42 1733399262

Not only that but Apple's ram is 0.5TB/s pretty much, a 4090 gets 1TB/s. I feel like the discrete card is the better value proposition because: nobody should need to be running 80GB models on a laptop, I feel this is more in the high perf/research area, you could argue that it could be a useful tool as a co-pilot but you've tuned your machine to use all ram for the model...you can't do anything else. Additionally, it's such a specific use case for the machine that trying to sell it would be hard, whereas I can hock off a GPU to someone doing data, ML, gaming, video editing, etc.

fragmede · 2024-11-30T22:30:32 1733005832

https://techobsessed.net/2023/12/increasing-ram-available-to... says it's tunable via the terminal down to 2GiB for reserved for the OS with the rest allocated for GPU use.