I don't see any explanation for why they trained 8B instead of 7B. I thought tha...

rileyphone · 2024-04-18T16:55:05.000000Z

The bigger size is probably from the bigger vocabulary in the tokenizer. But most people are running this model quantized at least to 8 bits, and still reasonably down to 3-4 bpw.

kristianp · 2024-04-19T02:06:16.000000Z

> The bigger size is probably from the bigger vocabulary in the tokenizer.

How does that affect anything? It still uses 16 bit floats in the model doesn't it?

dheera · 2024-04-18T17:14:38.000000Z

Upgrade to a 24GB GPU?

JustBreath · 2024-04-18T18:48:38.000000Z

Any recommendations?

qball · 2024-04-18T19:09:29.000000Z

3090, trivially.

No reason to go 4090 as it's no more capable, and the 5090 is probably not going to have more than 24GB on it either simply because nVidia wants to maintain their margins through market segregation (and adding more VRAM to that card would obsolete their low-end enterprise AI cards that cost 6000+ dollars).

JustBreath · 2024-04-18T19:16:38.000000Z

Appreciate the info!

In another thread I saw a recommendation for dual 3090s if you're not doing anything gaming related, good to have some confirmation there.

dheera · 2024-04-18T19:33:03.000000Z

I'd also consider dual A6000-48GB (96GB total) if you have a budget of $8000 or dual V100-32GB (64GB) if you have a budget of $4000.

V100 is old and slower, but for AI applications, RAM is king and there are lots of enterprise V100's coming off racks and being sold on eBay for cheap.