One 3090 seems to be equivalent to one M3 Max at inference: https://www.reddit.c... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		manmal 6 months ago \| parent \| context \| favorite \| on: Qwen2.5-1M: Deploy your own Qwen with context leng... One 3090 seems to be equivalent to one M3 Max at inference: https://www.reddit.com/r/LocalLLaMA/s/BaoKxHj8ww There are many such threads on Reddit. M4 Max is incrementally faster, maybe 20%. Even if you factor in electricity costs, a 2x 3090 setup is IMO the sweet spot, cost/benefit wise. And it’s maybe a zany line of argumentation, but 2x 3090 use 10x the power of an M4 Max. While the M4 is maybe the most efficient setup out there, it’s not nearly 10x as efficient. That’s IMO where the lack of compute power comes from.

sgt 6 months ago [–]

What is the GPU memory on that 3090?

manmal 6 months ago | [–]

24GB VRAM. Using multiple ones scales well because models can be split by layers, and run in a pipelined fashion.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact