the paper says llama: > We develop a large multimodal model (LMM), by connecting...

orpheansodality on April 19, 2023 | parent | context | favorite | on: LLaVA: Large Language and Vision Assistant

the paper says llama:

> We develop a large multimodal model (LMM), by connecting the open-set visual encoder of CLIP [36] with the language decoder LLaMA, and fine-tuning them end-to-end on our generated instructional vision-language data