If you use the bing chat interface and say "Can you draw me a picture of X?", then it responds with "I’m sorry, but I’m not able to draw pictures. Is there anything else I can help you with?" followed immediately by "Your image is taking a while to generate. Check your image creation progress at Image Creator."
Looks like they might perhaps be using a LLM for the chat responses that isn't aware that it has the ability to draw images, and in parallel another model who decides what to draw and show to the user.
I think this has to do with the verb "draw". LLM is just saying it cannot draw. The image generation is likely a function it "calls". The LLM probably thinks of the image generator as a tool it uses, a separate entity from itself.
> The LLM probably thinks of the image generator as a tool it uses
I don’t think it’s correct to describe the LLM as “thinking” in this instance, and not even for the normal philosophical objections, but just because I suspect it is a bad heuristic for designing these kinds of prompts.
Probably. I’ve had limited success getting LLMs (trained on chats/instruct) to output special codes indicating they’re communicating with a separate system (e.g. google, stable diffusion) and then taking that and feeding it back to the user
Looks like they might perhaps be using a LLM for the chat responses that isn't aware that it has the ability to draw images, and in parallel another model who decides what to draw and show to the user.