Anyone get performance numbers for other 30 series cards? 3060 12gb? I’m curious...

lhl · on June 13, 2023

Not a 30 series, but on my 4090 I'm getting 32.0 tokens/s on a 13b q4_0 model (uses about 10GiB of VRAM) w/ full context (`-ngl 99 -n 2048 --ignore-eos` to force all layers on GPU memory and to do a full 2048 context). As a point of reference, currently exllama [1] runs a 4-bit GPTQ of the same 13b model at 83.5 tokens/s.

[1] https://github.com/turboderp/exllama

speed_spread · on June 13, 2023

Also, someone please let me use my 3050 4GB for something else than stable diffusion generation of silly thumbnail-sized pics. I'd be happy with an LLM that's specialized in insults and car analogies.

Blahah · on June 13, 2023

You can split inference between CPU and GPU using whatever available GPU vRAM with llama.cpp. And you can run many small models with 4GB of vRAM. Anything with 3B parameters quantized to 4bit should be fine.

hospitalJail · on June 13, 2023

Do you have your settings correct? I have a 1650 on an old computer and I have generated 512x512 pictures and merged models.

Heck, even using CPU, I've been able to generate 512x512.

If you arent generating 512x512 pictures, lmk, I'll go grab my bat file's startup parameters.

getcrunk · on June 13, 2023

I can’t do 512x512 on a laptop 1050 ti 4gb

hospitalJail · on June 14, 2023

Are you using automatic1111, do you have the -low vram setting checked?

Just to be clear, I can also make 512x512 with CPU, so you basically just need to have the correct config of the .bat file.

wing-_-nuts · on June 13, 2023

Did you buy that card for cuda? Cause otherwise I have no idea why someone would chose a 3050 over a 6600

machinawhite · on June 13, 2023

What's a 6600, some AMD card? And why is it better?

smoldesu · on June 13, 2023

Cheaper Nvidia cards are generally considered to have dubious value. Having seen the benchmarks I agree, but it's not like a game-changing difference really. For CUDA and ML stuff, the 3050 would run circles around the 6600.

wing-_-nuts · on June 13, 2023

For cuda and ML you'd be much better off choosing a 3060. Honestly, if you've only got the money for a 4gb 3050, you're prob better off working in google colab

smoldesu · on June 13, 2023

With layering enabled, I don't necessarily agree. Not being able to load an entire model into memory isn't a dealbreaker these days. You can even layer onto swap space if your drive is fast enough, so there's really no excuse not to use the hardware if you have it. Unless you just like the cloud and hate setting stuff up yourself, or what have you.

wing-_-nuts · on June 13, 2023

Yeah for the price of a 4gb 3050 you could afford a 8gb rx 6600 xt which is way faster.