This may be obvious to people who do this regularly, but what kind of machine is required to run this? I downloaded & tried it on my Linux machine that has a 16GB GPU and 64GB of RAM. This machine can run SD easily. But Qwen-image ran out of space both when I tried it on the GPU and on the CPU, so that's obviously not enough. But am I off by a factor of two? An order of magnitude? Do I need some crazy hardware?
> This may be obvious to people who do this regularly
This is not that obvious. Calculating VRAM usage for VLMs/LLMs is something of an arcane art. There are about 10 calculators online you can use and none of them work. Quantization, KV caching, activation, layers, etc all play a role. It's annoying.
But anyway, for this model, you need 40+ GB of VRAM. System RAM isn't going to cut it unless it's unified RAM on Apple Silicon, and even then, memory bandwidth is shot, so inference is much much slower than GPU/TPU.
Also I think you need a 40GB "card", not just 40GB of vram. I wrote about this upthread, you're probably going to need one card, I'd be surprised if you could chain several GPUs together.
Oh right, I forgot some diffusion models can't offload / split layers. I don't use vision generation models much at all - was just going off LLM work. Apologies for the potential misinformation.
Nah, that won’t gain you much (if anything?) over just doing the layer swaps on RAM. You can put the text encoder on the second card but you can also just put it in your RAM without much for negatives.
I believe it's roughly the same size as the model files. If you look in the transformers folder you can see there are around 9 5gb files, so I would expect you need ~45gb vram on your GPU. Usually quantized versions of models are eventually released/created that can run on much less vram but with some quality loss.
I've been bugging them about this for a while. There are repos that contain multiple model weights in a single repo which means adding up the file sizes won't work universally, but I'd still find it useful to have a "repo size" indicator somewhere.
HF does this for ggufs, and it’ll show you what quantizations will work on the GPU(s) you’ve selected. Hopefully that feature gets expanded to support more model types.
> I think the fact that, as far as I understand, it takes 40GB of VRAM to run, is probably dampening some of the enthusiasm.
For PCs I take it one that has two PCIe 4.0 x16 or more recent slots? As in: quite some consumers motherboards. You then put two GPU with 24 GB of VRAM each.
A friend runs this (don't know if the tried this Qwen-Image yet): it's not an "out of this world" machine.