Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I love it, I wonder if LLMs couldn't be used to generate some pictures.


Think you are being downvoted because technically an LLM stands for Large Language Models, while image generation is something like LDM latent diffusion models or most people just say Stable Diffusion. If you would have said the more generic ChatGPT then the answer would have been maybe since its supposed to be Multi-Modal. If you are asking if Stable Diffusion could generate pictures based of the article description the answer is most likely yes given the prompt correctly parses the context correctly (maybe using an LLM for getting the prompt before feeding into the next tool).

I hear almost everyone call everything ChatGPT no matter what they are using (MS, Google Gemini, Ollama, etc…) ChatGPT made me a PP, ChatGPT made this image, so maybe if you want a general term ChatGPT would be more correct. I typically say ChatGPT for LLMs and Stable Diffusion when talking to normal people since most are familiar with the big two. Predicting the next pixel to draw is sorta like predicting the next word (token) I guess

Most people know what you mean, but still good to use the correct term. Disclaimer so I do not get downvoted with you: Please do your own research as I just wrote how I understand these concepts and am not an expert.


It's funny because LLMs are now also using diffusion.


LLMs cannot generate pictures




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: