Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Anyone get performance numbers for other 30 series cards? 3060 12gb?

I’m curious how it compares to his apple silicon numbers



Not a 30 series, but on my 4090 I'm getting 32.0 tokens/s on a 13b q4_0 model (uses about 10GiB of VRAM) w/ full context (`-ngl 99 -n 2048 --ignore-eos` to force all layers on GPU memory and to do a full 2048 context). As a point of reference, currently exllama [1] runs a 4-bit GPTQ of the same 13b model at 83.5 tokens/s.

[1] https://github.com/turboderp/exllama


Also, someone please let me use my 3050 4GB for something else than stable diffusion generation of silly thumbnail-sized pics. I'd be happy with an LLM that's specialized in insults and car analogies.


You can split inference between CPU and GPU using whatever available GPU vRAM with llama.cpp. And you can run many small models with 4GB of vRAM. Anything with 3B parameters quantized to 4bit should be fine.


Do you have your settings correct? I have a 1650 on an old computer and I have generated 512x512 pictures and merged models.

Heck, even using CPU, I've been able to generate 512x512.

If you arent generating 512x512 pictures, lmk, I'll go grab my bat file's startup parameters.


I can’t do 512x512 on a laptop 1050 ti 4gb


Are you using automatic1111, do you have the -low vram setting checked?

Just to be clear, I can also make 512x512 with CPU, so you basically just need to have the correct config of the .bat file.


Did you buy that card for cuda? Cause otherwise I have no idea why someone would chose a 3050 over a 6600


What's a 6600, some AMD card? And why is it better?


Cheaper Nvidia cards are generally considered to have dubious value. Having seen the benchmarks I agree, but it's not like a game-changing difference really. For CUDA and ML stuff, the 3050 would run circles around the 6600.


For cuda and ML you'd be much better off choosing a 3060. Honestly, if you've only got the money for a 4gb 3050, you're prob better off working in google colab


With layering enabled, I don't necessarily agree. Not being able to load an entire model into memory isn't a dealbreaker these days. You can even layer onto swap space if your drive is fast enough, so there's really no excuse not to use the hardware if you have it. Unless you just like the cloud and hate setting stuff up yourself, or what have you.


Yeah for the price of a 4gb 3050 you could afford a 8gb rx 6600 xt which is way faster.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: