Module paligemma

Module paligemma 

Source
Expand description

Multimodal multi-purpose model combining Gemma-based language model with SigLIP image understanding

See PaLiGemma details at:

The model is a multimodal combination of:

  • SigLIP vision encoder
  • Gemma language model
  • Cross-projection layers

References:

Structsยง

Config
Model
MultiModalProjector