7B models are small enough to be usable on a smartphone, so a local handheld ass...

minimaxir · on Sept 27, 2023

"usable" is not the same as practical.

Even running a quantized and optimized LLM on a smartphone would kill battery life at minimum.

brucethemoose2 · on Sept 27, 2023

Try MLC-LLM. Its not as bad as you'd think.

In the future(?), they will probably use the AI blocks instead of the GPU, which are very low power.

ComputerGuru · on Sept 27, 2023

Are they? Unquantized, Llama 2 7b needs over 14GB of GPU (or shared) memory.

polygamous_bat · on Sept 27, 2023

"Unquantized" is the key word here: with quantization you can get a 4-8x improvement without much performance degradation.