The sweet spot for running local LLMs (from what I'm seeing on forums like r/loc... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		UncleOxidant on Jan 27, 2025 \| parent \| context \| favorite \| on: The impact of competition and DeepSeek on Nvidia The sweet spot for running local LLMs (from what I'm seeing on forums like r/localLlama) is 2 to 4 3090s each with 24GB of VRAM. NVidia (or AMD or Intel) would clean up if they offered a card with 3090 level performance but with 64GB of VRAM. Doesn't have to be the leading edge GPU, just a decent GPU with lots of VRAM. This is kind of what Digits will be (though the memory bandwidth is going to be slower with because it'll be DDR5) and kind of what AMD's Strix Halo is aiming for - unified memory systems where the CPU & GPU have access to the same large pool of memory.

redlock on Jan 27, 2025 [–]

The issue here is that, even with a lot of VRAM, you may be able to run the model, but with a large context, it will still be too slow. (For example, running LLaMA 70B with a 30k+ context prompt takes minutes to process.)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact