Hacker News new | past | comments | ask | show | jobs | submit login

If you use the bing chat interface and say "Can you draw me a picture of X?", then it responds with "I’m sorry, but I’m not able to draw pictures. Is there anything else I can help you with?" followed immediately by "Your image is taking a while to generate. Check your image creation progress at Image Creator."

Looks like they might perhaps be using a LLM for the chat responses that isn't aware that it has the ability to draw images, and in parallel another model who decides what to draw and show to the user.




I try to avoid prompts like "Can you ...?" because they could be interpreted as yes/no answers as opposed to commands to do something.

I've been prompting Bing with "Draw me an image of..." or even just "Image: image description" and it's worked well for me so far.


I think this has to do with the verb "draw". LLM is just saying it cannot draw. The image generation is likely a function it "calls". The LLM probably thinks of the image generator as a tool it uses, a separate entity from itself.


> The LLM probably thinks of the image generator as a tool it uses

I don’t think it’s correct to describe the LLM as “thinking” in this instance, and not even for the normal philosophical objections, but just because I suspect it is a bad heuristic for designing these kinds of prompts.


As an alternative, I'll ask it to "reckon". For images, simply directing it to "create" an image suffices.

https://www.wordnik.com/words/reckon


Probably. I’ve had limited success getting LLMs (trained on chats/instruct) to output special codes indicating they’re communicating with a separate system (e.g. google, stable diffusion) and then taking that and feeding it back to the user


It gives weird errors like that in the chat if it detects the output image as NSFW. Lots of false positives.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: