It's going to be faster no matter what. My M3 MAX prints tokens faster than I ca... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		Art9681 41 days ago \| parent \| context \| favorite \| on: MacBook Pro with M5 Pro and M5 Max It's going to be faster no matter what. My M3 MAX prints tokens faster than I can read for the new MoE models. It's the prompt processing that kills it when the context grows beyond a threshold which is easy to do in the modern agentic loops.

fulafel 41 days ago [–]

If your computer was faster at it, you could run more capable models at the same token rate.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact