> Forget GPUs, this thing is plenty fast on CPUs. Does this mean everyone could ...

hobofan · on Nov 22, 2023

The bottleneck for "easy integration" into games and applications right now is as much the RAM usage as is the slowness. This would probably bring the speed to an acceptable level but you would still have to hold the whole model in RAM.

That would make it a lot more feasible to run models in the cloud (triple digit RAM is a lot more abundant than VRAM), but wouldn't do that much for consumer hardware.

nolist_policy · on Nov 22, 2023

I wonder if the model takes similar branches while in the same context? Then you can fault in parts of the model from disk as needed.

entropicdrifter · on Nov 22, 2023

Interesting idea. Like texture streaming, you'd just stream in the parts of the model from disk to fill up all available RAM. If the NPC needed to think about something not cached in RAM, you'd throw up a "hmm, let me think about this" while stuff loads from disk.