Hacker News new | past | comments | ask | show | jobs | submit login

It is exciting that you could train a CLIP-style model from scratch with only 4M datapoints. But if you’ve got that data, why not fine tune a pretrained model with your 4M points? It seems likely to outperform the from-scratch method.



There is not only a difference in the data source but pre-trained tasks as well. But you are right, a fine-tuned models on human-annotated data are way better than zero-shot (just pre-trained) on Image retrieval. And it is correct for CLIP, ALBEF, VICHA, and UFORM.


Any plans to document how to fine tune your models then?


It will take some time, but yes, we have this in our plans.


perhaps this approach can lead to better training of foundational models?..


More efficient - for sure!




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: