One big bottleneck is SRAM cost. Even an 8b model would probably end up being hundreds of dollars to run locally on that kind of hardware. Especially unpalatable if the model quality keeps advancing year-by-year.
> Or we develop a new silicon process that can mimic synaptic weights in biology. Synapses have plasticity.
It's amazing to me that people consider this to be more realistic than FAANG collaborating on a CUDA-killer. I guess Nvidia really does deserve their valuation.
I mean if it was small enough to fit in an iPhone why not? Every year you would fabricate the new chip with the best model. They do it already with the camera pipeline chips.
This exists[0], but the chip in question is physically large and won't fit on a phone.
[0] https://www.anuragk.com/blog/posts/Taalas.html