Model weights are significantly larger than cache in almost all cases. Even an 8...

menaerus · 2025-06-25T11:58:25 1750852705

You've made several incorrect assumptions and I am not bothered enough to try to correct them so I apologize for my ignorance. I'll just say that 16ms memory tax is wildly incorrect.

namibj · 2025-06-25T16:58:58 1750870738

You are either having a massive misconception of GPT-like decoder transformers, of how GPU data paths are architected, or are trolling. Go talk to a modern reasoning model to get yourself some knowledge, it's gonna be much better than what you appear to have.