Qwen3.5 9b seems to be fairly competent at OCR and text formatting cleanup runni...

acters · 2026-03-08T06:34:21 1772951661

I have a 1660ti and the cachyos + aur/llama.cpp-cuda package is working fine for me. With about 5.3 GB of usable memory, I find that the 35B model is by far the most capable one that performs just as fast as the 4B model that fits entirely on my GPU. I did try the 9B model and was surprisingly capable. However 35B still better in some of my own anecdotal test cases. Very happy with the improvement. However, I notice that qwen 3.5 is about half the speed of qwen 3

AllegedAlec · 2026-03-09T09:16:26 1773047786

I found that the drivers I had were no longer compatible with the newer kernels. After upgrading to newer drivers it was able to offload again.

dunb · 2026-03-08T17:56:12 1772992572

Are you running with all the --fit options and it’s not working correctly? You could try looking at how many layers are being attempted to offload and manually adjust from there. Walk down --n-gpu-layers with a bash script until it loads.

lioeters · 2026-03-08T12:30:09 1772973009

> GPU offloading working

I had this issue which in my case was solved by installing a newer driver. YMMV.

  sudo apt install nvidia-driver-570

WhyNotHugo · 2026-03-08T07:25:06 1772954706

If you’re building from source, the vulkan backend is the easiest to build and use for GPU offloading.

Curiositry · 2026-03-08T07:29:42 1772954982

Yes, that's what I tried first. Same issue with trying to allocate more memory than was available.