Say more please if you can. How/why is ik_llama.cpp faster then mainline, for th... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ljosifov 31 days ago \| parent \| context \| favorite \| on: How to run Qwen 3.5 locally Say more please if you can. How/why is ik_llama.cpp faster then mainline, for the 27B dense? I'd like to be able to run 27B dense faster on a 24GB vram gpu, and also on an M2 max.

ac29 31 days ago [–]

ik_llama.cpp was about 2x faster for CPU inference of Qwen3.5 versus mainline until yesterday. Mainline landed a PR that greatly increased speed for Qwen3.5, so now ik_llama.cpp is only 10% faster on token generation.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact