The bottleneck for "easy integration" into games and applications right now is as much the RAM usage as is the slowness. This would probably bring the speed to an acceptable level but you would still have to hold the whole model in RAM.
That would make it a lot more feasible to run models in the cloud (triple digit RAM is a lot more abundant than VRAM), but wouldn't do that much for consumer hardware.
Interesting idea. Like texture streaming, you'd just stream in the parts of the model from disk to fill up all available RAM. If the NPC needed to think about something not cached in RAM, you'd throw up a "hmm, let me think about this" while stuff loads from disk.
Does this mean everyone could be running the 100+b models from ram?
This opens up a lot , some models could be run very fast on small machines with this.
Bundling a small model inside a game to act as part of the mind for ingame NPC's (obviously with some tuning) becomes practical with this.