Module llava

Module llava 

Source
Expand description

The LLaVA (Large Language and Vision Assistant) model.

This provides the main model implementation combining a vision tower (CLIP) with language model (Llama) for multimodal capabilities. The architecture implements the training-free projection technique.

Modules§

config
utils

Structs§

ClipVisionTower
IdentityMap
LLaVA
MMProjector