Hacker News new | past | comments | ask | show | jobs | submit login

With glide I think we've reached something of a plateau in terms of architecture on the "text to image generator S curve". DALL-E-2 is a very similar architecture to glide and has some notable downsides (poorer language understanding)

glid-3 is a relatively small model trained by a single guy on his workstation (aka me) so it's not going to be as good. It's also not fully baked yet so ymmv, although it really depends on the prompt. The new latent diffusion model is really amazing though and is much closer to DALLE-2 for 256px images.

I think the open source community will rapidly catch up with Openai in the coming months. The data, code and compute are all there to train a model of similar size and quality.




Wow. Thanks for GLID-3. It was genuinely exciting for a few days but then I must admit latent diffusion stole my attention somewhat ;-)

What kind of prompts is GLID-3 especially good for? I remember getting lucky when I was playing around a few times but I didn't do it systematically.


glid-3 is trained specifically on photographic-style images, and is a bit better at generalization compared to the latent diffusion model.

eg. prompt: half human half Eiffel tower. A human Eiffel tower hybrid (I get mostly normal Eiffel towers from LDM but some sensical results from glid-3)

glid-3 will be worse for things that require detailed recall, like a specific person.

With smaller models you kind of have to generate a lot of samples and pick out the best ones.


Thanks!

Do you happen to know how much GPU RAM I need to run glid-3 and/or the latent diffusion model, if I don't want to run on colab?


Just tried glid-3 with a batch size of one and I'm getting 4781MiB. The latent diffusion model peaks at 8403MiB

These are fp16 numbers though, you might need a recent nvidia card to run it.


I'll try them out. I have an RTX 2070, which apparently supports fp16. But it only has 8GB RAM.

I used the instructions here to check: https://github.com/wang-xinyu/tensorrtx/blob/master/tutorial...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: