Hacker News new | past | comments | ask | show | jobs | submit login

Use halfprecision float and/or the optimized forks

https://github.com/basujindal/stable-diffusion

https://github.com/neonsecret/stable-diffusion

Or the hlky webui, that is optimized too.

http://rentry.co/kretard




I've been trying to get the basujindal fork to work, but it seems to be putting all work on the CPU. I've been running the example txt2img prompt for 30 minutes now and it's still not finished. It has reserved 4Gb memory from the GPU, but the GPU doesn't appear to be doing any work, only CPU is doing work.


Use the original SD repo. But modify the txt2img.py according to:

https://github.com/CompVis/stable-diffusion/issues/86#issuec...


I now did everything I could to constrain the memory usage of the original SD repo, I was finally able to get it to run, and it produced green squares as output :(

What I did:

- scripts/txt2img.py, function - load_model_from_config, line - 63, change from: model.cuda() to model.cuda().half()

- removed invisible watermarking

- reduced n_samples to 1

- reduced resolution to 256x256

- removed sfw filter

Just can't get it to work and it's not producing an error message or anything that I could debug it with.


Your model is overflowing/underflowing generating NaNs. I got it with memory optimised, increased resolution (multiples of 32, 384 x 384) and full precision while keeping it in 4 GB.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: