Hacker News new | past | comments | ask | show | jobs | submit login

What hardware are you able to run this on?



If your job or hobby in any way likes LLMs, and you like to "Work Anywhere", it's hard not to justify the MBP Max (e.g. M3 Max, now M4 Max) with 128GB. You can run more than you'd think, faster than you'd think.

See also Hugging Face's MLX community:

https://huggingface.co/mlx-community

QwQ 32B is featured:

https://huggingface.co/collections/mlx-community/qwq-32b-pre...

If you want a traditional GUI, LM Studio beta 0.3.x is iterating on MLX: https://lmstudio.ai/beta-releases


For that price you could get some beefy Nvidia GPUs or a lot of cloud credits, but the unified memory and nice laptop are a real bonus.

I've been off Mac's for ten years since OSX started driving me crazy, but I've been strongly considering picking up the latest Mac Mini as a poor man's version of what you're talking about. For €1k you can get an M4 with 32GiB of unified ram, of an M4 pro with 64GiB for €2k which is a bit more affordable.

If you shucked the cheap ones into your rack you could have a very hefty little Beowulf cluster for the price of that MBP.


4699$US. Quickest justification I ever made not to buy something


To add to that, given the wild trajectory of the field, it's at the very least doubtful, that that's going to buy you access to hardware (let alone: a model) that's still even remotely desirable for relevant AI use, even a year from now.


But it has the logo I’ve been told to buy, so regardless of the quality or specs I need to buy it, how else will I ever show my face in Zoom meetings to my SoCal colleagues?


Yeah, exactly. If you don't want a discrete GPU of your own in a desktop that you could technically access remotely, then go with cloud GPU.

Why pay Apple silly money for their ram when you could take that same money, get a MB Air and build a desktop with a 4090 in it (hell, if you already have the desktop you could buy TWO 4090s for that money). Then just set up the server to use remotely.


> why

offline. anywhere.

separately, if you purchase anything from Dell to Framework, you quickly learn there's no "apple tax", there's a cost for a type of engineering to a given spec (for oranges to oranges "TCO" you should spec all the things, including screen, energy, and resale value), and it's comparable from any manufacture, with the Apple resale value dropping their TCO lower than the others.


Yeah I suppose if you wanna hike up a mountain & still be able to use a full sized model then, yeah. But be my guest, I ain't doing that any time soon!


Works well for me on an MBP with 36GB ram with no swapping (just).

I've been asking it to perform relatively complex integrals and it either manages them (with step by step instructions) or is very close with small errors that can be rectified by following the steps manually.


M2 MacBook Pro with 64GB of RAM.


I am running it on a 32G memory mac mini with an M2 Pro using Ollama. It runs fine, faster than I expected. The way it explains plans for solving problems, then proceeding step by step is impressive.


How many tokens per second?


Another data point:

17.6 tokens/s on an M4 Max 40 core GPU


I am away from my computer, but I think it was about 10/second - not too bad.


8.4 tps on M1 Pro chip with 32GM RAM (Q4 model, 18GB).


Sorry for the random question, I wonder if you know, what's the status of running LLMs non-NVIDIA GPUs nowadays? Are they viable?


I run llama on 7900XT 20GB, works just fine.


Apple silicon is pretty damn viable.


Pretty sure they meant AMD


Yeah, but if you buy ones with enough RAM, you're not really saving money compared to NVIDIA, and you're likely behind in perf.


Nvidia won’t sell these quantities of RAM at Apple’s pricing. An A100 80GB is $14k, while an M3 Max MBP with 96GB of RAM can be had for $2.7k.


96GB of unified RAM. How much of that is available to the graphics cores? I haven't tested a later model but the M1 Max would max out at 16GB VRAM regardless of how much the machine had.

There's a reason companies are setting up clusters of A100s, not MacBooks.


Not only that but Apple's ram is 0.5TB/s pretty much, a 4090 gets 1TB/s. I feel like the discrete card is the better value proposition because: nobody should need to be running 80GB models on a laptop, I feel this is more in the high perf/research area, you could argue that it could be a useful tool as a co-pilot but you've tuned your machine to use all ram for the model...you can't do anything else. Additionally, it's such a specific use case for the machine that trying to sell it would be hard, whereas I can hock off a GPU to someone doing data, ML, gaming, video editing, etc.


https://techobsessed.net/2023/12/increasing-ram-available-to... says it's tunable via the terminal down to 2GiB for reserved for the OS with the rest allocated for GPU use.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: