Use halfprecision float and/or the optimized forks https://github.com/basujindal...

baobabKoodaa · on Sept 4, 2022

I've been trying to get the basujindal fork to work, but it seems to be putting all work on the CPU. I've been running the example txt2img prompt for 30 minutes now and it's still not finished. It has reserved 4Gb memory from the GPU, but the GPU doesn't appear to be doing any work, only CPU is doing work.

prettydeep · on Sept 4, 2022

Use the original SD repo. But modify the txt2img.py according to:

https://github.com/CompVis/stable-diffusion/issues/86#issuec...

baobabKoodaa · on Sept 4, 2022

I now did everything I could to constrain the memory usage of the original SD repo, I was finally able to get it to run, and it produced green squares as output :(

What I did:

- scripts/txt2img.py, function - load_model_from_config, line - 63, change from: model.cuda() to model.cuda().half()

- removed invisible watermarking

- reduced n_samples to 1

- reduced resolution to 256x256

- removed sfw filter

Just can't get it to work and it's not producing an error message or anything that I could debug it with.

rrobukef · on Sept 4, 2022

Your model is overflowing/underflowing generating NaNs. I got it with memory optimised, increased resolution (multiples of 32, 384 x 384) and full precision while keeping it in 4 GB.