I think their analysis relies on missing out the obvious use for SRAM - caches o...

punkgenius · on July 12, 2023

I think it’s all about the performance-to-cost ratio. The reason you need a cache is because you want to reduce the latency and power accessing data. DRAM can also be thought of as the cache of disc drives, why dont people use cheap disc drives for deep learning? It’s way too slow. Weights in SRAM is more expensive than weights in DRAM, however, the latency and energy streaming in weight from DRAM is even more expensive than that. LLM is so memory bound and I guess that's why they use a expensive but faster memory. This might only makes sense for companies like Google and Microsoft, who really need to do LLM on millions of tokens per sec and really care about the performance-to-cost ratio.