I checked the fine print on the product website: by “up to 4x faster LLM prompt processing,” they’re specifically referring to time to first token. So it’s not about token generation rate (tokens per second).
Yes. This is known. They added neural accelerators, aka Tensor core equivalent, in the GPU. This will make prompt processing competitive vs similar class GPUs.
It’s a big deal! Prompt processing was previously the Mac’s weak point. Sure, output generation matters for file recital in programming, but in general conversation I’d rather have it output a short answer anyway (after extensive processing by a smart model).