Expand description
BLIP model implementation with quantization support.
BLIP is a vision-language model for image understanding and generation tasks. This implementation provides quantization for reduced memory and compute.
Key characteristics:
- Vision encoder using ViT architecture
- Text decoder using BERT-style transformer
- Cross-attention between vision and text features
- Support for 8-bit quantization
References:
Re-exports§
pub use crate::quantized_var_builder::VarBuilder;