Modern GPU architectures are quite different than what comes before them if you ...

throwaway81523 · on July 18, 2024

Really old GPU's were different but the 1080 is similar to later stuff with a few features missing. Half precision and "tensor cores" iirc. It could be that the very most recent stuff has changed more (I haven't paid attention) but I thought that the 4090 was just another evolutionary step.

saagarjha · on July 19, 2024

Those are the features everyone is using, though.

wing-_-nuts · on July 17, 2024

Everyone and I mean everyone I know doing AI / ML work values VRAM above all. The absolute bang for buck are buying used p40's and if you actually want to have those cards be usable for other stuff, used 3090's are the best deal around and they should be ~ $700 right now.

saagarjha · on July 19, 2024

What they really value is bandwidth. More VRAM is just more bandwidth.

wing-_-nuts · on July 22, 2024

Well, to give an example, 32GB of vram would be vastly more preferable to 24GB of higher bandwidth vram. You really need to be able to put the entire LLM in memory for best results, because otherwise you're bottlenecking on the speed of transfer between regular old system ram and the gpu.

You'll also note that M1/2 macs with large amounts of system memory are good at inference because of the fact that the gpu has a very high speed interconnect between the soldiered on ram modules and the on die gpu. It's all about avoiding bottlenecks whereever possible.

touisteur · on July 18, 2024

Not really any paradigm shift since the introduction of Tensor Cores in NVIDIA archs. Anything Ampere or Lovelace, will do to teach yourself CUDA up to the crazy optimization techniques and the worst libraries that warp the mind. You'll only miss on HBM which allows you to cheat on memory bandwidth, amount of VRAM (teach yourself on smaller models...), double precision perf and double precision tensor cores (go for an A30 then and not sure they'll keep them - either the x30 bin, or DP tensor cores - ever since "DGEMM on Integer Matrix Multiplication Unit" - https://arxiv.org/html/2306.11975v4 ). FP4, DPX, TMA, GPUDirect are nice but you must be pretty far out already for them to be mandatory...

saagarjha · on July 19, 2024

"Cheating on bandwidth" is the name of the game right now.