Expand description
Mistral model implementation with quantization support.
Mistral is a large language model optimized for efficiency. This implementation provides quantization for reduced memory and compute.
Key characteristics:
- Sliding window attention mechanism
- Grouped query attention (GQA)
- RMSNorm for layer normalization
- Rotary positional embeddings (RoPE)
- Support for 8-bit quantization
References:
Re-exports§
pub use crate::quantized_var_builder::VarBuilder;pub use crate::models::mistral::Config;