In general you can swap B for GB (and use the q8 quantization), so 8GB VRAM can ... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		cjbprime on April 18, 2024 \| parent \| context \| favorite \| on: Meta Llama 3 In general you can swap B for GB (and use the q8 quantization), so 8GB VRAM can probably just about work.

lolinder on April 18, 2024 [–]

If you want to not quantize at all, you need to double it for fp16—16GB.

cjbprime on April 19, 2024 | [–]

Yes, but I think it's standard to do inference at q8, not fp16.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact