Hacker News new | past | comments | ask | show | jobs | submit login

7B models are small enough to be usable on a smartphone, so a local handheld assistant sounds like a use case.



"usable" is not the same as practical.

Even running a quantized and optimized LLM on a smartphone would kill battery life at minimum.


Try MLC-LLM. Its not as bad as you'd think.

In the future(?), they will probably use the AI blocks instead of the GPU, which are very low power.


Are they? Unquantized, Llama 2 7b needs over 14GB of GPU (or shared) memory.


"Unquantized" is the key word here: with quantization you can get a 4-8x improvement without much performance degradation.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: