Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Are you suggesting use the clip embedding for the text as a feature to train a standard Ml model on?




I think they're suggesting doing that with BERT for text and CLIP for images. Which in my experience is indeed quite effective (and easy/fast).

There have been some developments in the image-of-text/other-than-photograph area though recently. From Meta (although they seem unsure of what exactly their AI division is called): https://arxiv.org/abs/2510.05014 and Qihoo360: https://arxiv.org/abs/2510.27350 for instance.


I think he is. I do things like that plenty.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: