+ Flickr, a huge AI-friendly database of human-captioned images
And the DALLE architecture can also handle masking where you initialize only part of the latent space with noise and initialize other parts with a starting image. The video on their website shows examples of that to replace a pet on a chair.
> + Flickr, a huge AI-friendly database of human-captioned images
You're right, I forgot to mention them. Their metadata is great, and most importantly photos have their licenses tagged and many of them are CC0 (including mine).
Are content and content tags that great though? I don't content tag my own photos there, and when I've tried to comparison shop cameras all the popular images are over edited /r/shittyhdr art…
Metadata seems really interesting though. You'd think a visual search AI would want to know the white balance EXIF tags on a photo, so it knows if a yellow object is actually yellow or just under a streetlight.
And the DALLE architecture can also handle masking where you initialize only part of the latent space with noise and initialize other parts with a starting image. The video on their website shows examples of that to replace a pet on a chair.