Module quantized_mpt

Module quantized_mpt 

Source
Expand description

Quantized MPT model implementation.

MPT (MPT-7B) is a causal transformer model series optimized for code generation. This implementation provides quantization for reduced memory and compute.

Key characteristics:

  • Multi-Query Grouped Attention (MQA)
  • Support for KV-caching
  • Pre-computed ALiBi attention biases
  • Support for 8-bit quantization

References:

Re-exports§

pub use crate::quantized_var_builder::VarBuilder;
pub use super::mpt::Config;

Structs§

Model