4x faster is about token prefill, i.e. the time to first token. It should be on ...

		storus 12 days ago \| parent \| context \| favorite \| on: MacBook Pro with M5 Pro and M5 Max 4x faster is about token prefill, i.e. the time to first token. It should be on par with DGX Spark there while being slightly faster than M4 for token generation. I.e. when you have long context, you don't need to wait 15 minutes, only 4 minutes.

		help