Module quantized_mistral

Module quantized_mistral 

Source
Expand description

Mistral model implementation with quantization support.

Mistral is a large language model optimized for efficiency. This implementation provides quantization for reduced memory and compute.

Key characteristics:

  • Sliding window attention mechanism
  • Grouped query attention (GQA)
  • RMSNorm for layer normalization
  • Rotary positional embeddings (RoPE)
  • Support for 8-bit quantization

References:

Re-exports§

pub use crate::quantized_var_builder::VarBuilder;
pub use crate::models::mistral::Config;

Structs§

Model