Hacker News new | past | comments | ask | show | jobs | submit login

With Llava which was posted here like yesterday, I took screenshots of video games, which I like to play. The descriptions weren't perfect, but they were really nice to have! I had a picture in my camera roll of my sister and I in the park, having our picture taken by my mother. I remember that day. And that's kinda what pictures are for! I guess. I've been blind since birth so I've not been able to really have this kind of thing before. Apple built something basic into VoiceOver a few years ago, but it's nothing like this. VoiceOver's recognition is like "Two people at a park posing for a photo." Llava is like "In the image, a man and a woman pose together for a picture outside. The woman stands to the left of the man as they both strike a pose. They appear to be in a field with plants behind them. The man is wearing a black shirt, and both of them seem to be enjoying the moment.". That's about on the level of a demo Facebook did a good 5 years ago of their stupid image description thing. I mean even if it's not perfect, it's better than what we've had before. MiniGPT said it was two people in a barn. This gets even closer. GPT4's image descriptions will probably be even better! I can't wait to try those!



Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: