I've been trying to get the basujindal fork to work, but it seems to be putting all work on the CPU. I've been running the example txt2img prompt for 30 minutes now and it's still not finished. It has reserved 4Gb memory from the GPU, but the GPU doesn't appear to be doing any work, only CPU is doing work.
I now did everything I could to constrain the memory usage of the original SD repo, I was finally able to get it to run, and it produced green squares as output :(
What I did:
- scripts/txt2img.py, function - load_model_from_config, line - 63, change from: model.cuda() to model.cuda().half()
- removed invisible watermarking
- reduced n_samples to 1
- reduced resolution to 256x256
- removed sfw filter
Just can't get it to work and it's not producing an error message or anything that I could debug it with.
Your model is overflowing/underflowing generating NaNs. I got it with memory optimised, increased resolution (multiples of 32, 384 x 384) and full precision while keeping it in 4 GB.
https://github.com/basujindal/stable-diffusion
https://github.com/neonsecret/stable-diffusion
Or the hlky webui, that is optimized too.
http://rentry.co/kretard