Wow, this is very timely! I just finished up a script that uses ChatGPT (via openAI APIs) to read my customer support messages on Etsy and generate a response. Since I often send and receive images via Etsy support (my customers can customize the product with images) I have been searching for a way to let ChatGPT "know" what the image is. Current the script just inserts the text "<uploaded image>", but I was just hacking together something using stable-diffusion-webui's API (interrogate using CLIP), but was struggling with a few things. I took a break to browse HN and this pops up!
I will definitely be taking a look to see how this works and will try to get it integrated with my script.
CLIP itself is available by itself (BLIP, etc) via python; useful for "output a csv containing filename, filesize, and description of every image in this directory".
is the stable-diffusion-webui API... stable, yet? I was looking at the json it's chucking back and forth from the frontend to the python backend, and it looks like it should be able to be parsed without a library.
I will definitely be taking a look to see how this works and will try to get it integrated with my script.