Hacker News new | past | comments | ask | show | jobs | submit login

> Forget GPUs, this thing is plenty fast on CPUs.

Does this mean everyone could be running the 100+b models from ram?

This opens up a lot , some models could be run very fast on small machines with this.

Bundling a small model inside a game to act as part of the mind for ingame NPC's (obviously with some tuning) becomes practical with this.




The bottleneck for "easy integration" into games and applications right now is as much the RAM usage as is the slowness. This would probably bring the speed to an acceptable level but you would still have to hold the whole model in RAM.

That would make it a lot more feasible to run models in the cloud (triple digit RAM is a lot more abundant than VRAM), but wouldn't do that much for consumer hardware.


I wonder if the model takes similar branches while in the same context? Then you can fault in parts of the model from disk as needed.


Interesting idea. Like texture streaming, you'd just stream in the parts of the model from disk to fill up all available RAM. If the NPC needed to think about something not cached in RAM, you'd throw up a "hmm, let me think about this" while stuff loads from disk.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: