Hacker News new | past | comments | ask | show | jobs | submit login

I don't see any explanation for why they trained 8B instead of 7B. I thought that If you have a 16GB GPU, you can put 14GB(7B*16bits) model into it, but how does it fit If the model is exactly 16GB?



The bigger size is probably from the bigger vocabulary in the tokenizer. But most people are running this model quantized at least to 8 bits, and still reasonably down to 3-4 bpw.


> The bigger size is probably from the bigger vocabulary in the tokenizer.

How does that affect anything? It still uses 16 bit floats in the model doesn't it?


Upgrade to a 24GB GPU?


Any recommendations?


3090, trivially.

No reason to go 4090 as it's no more capable, and the 5090 is probably not going to have more than 24GB on it either simply because nVidia wants to maintain their margins through market segregation (and adding more VRAM to that card would obsolete their low-end enterprise AI cards that cost 6000+ dollars).


Appreciate the info!

In another thread I saw a recommendation for dual 3090s if you're not doing anything gaming related, good to have some confirmation there.


I'd also consider dual A6000-48GB (96GB total) if you have a budget of $8000 or dual V100-32GB (64GB) if you have a budget of $4000.

V100 is old and slower, but for AI applications, RAM is king and there are lots of enterprise V100's coming off racks and being sold on eBay for cheap.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: