📣 The latest LLaVA model, LLaVA-v1.6-7B, is now accessible on Instill Cloud!

 stands for Large Language and Vision Assistant, an open-source multimodal model fine-tuned on multimodal instruction-following data. Despite its training on a relatively small dataset, LLaVA demonstrates remarkable proficiency in comprehending images and answering questions about them. Its capabilities resemble those of multimodal models like 

, LLaVA-v1.6 boasts several enhancements compared to LLaVA-v1.5:

 to learn how to leverage LLaVA's capabilities effectively.

Instill AI changelog

Introducing LLaVA 🌋: Your Multimodal Assistant