Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I checked the fine print on the product website: by “up to 4x faster LLM prompt processing,” they’re specifically referring to time to first token. So it’s not about token generation rate (tokens per second).


Yes. This is known. They added neural accelerators, aka Tensor core equivalent, in the GPU. This will make prompt processing competitive vs similar class GPUs.


It’s a big deal! Prompt processing was previously the Mac’s weak point. Sure, output generation matters for file recital in programming, but in general conversation I’d rather have it output a short answer anyway (after extensive processing by a smart model).


General conversation is already free with all the major providers (Claude, ChatGPT, etc.). That's not where the major gains in productivity lie.


It would probably be worth finding a more friendly way to market this, but it's a reasonable / accurate way to say it.

The prompt processing sped up.

Not the output generation.

M4 was notoriously slow at this compared to DGX etc.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: