I think it was 2x total speedup vs previous version, which already used gpu for ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		malf on June 13, 2023 \| parent \| context \| favorite \| on: Llama.cpp: Full CUDA GPU Acceleration I think it was 2x total speedup vs previous version, which already used gpu for “most” things, so the real speedup is 2/(1-most), which could be a lot.

supermatt on June 14, 2023 [–]

Thanks - that makes more sense. It wasn't clear from the article.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact