Expand description
Quantized MPT model implementation.
MPT (MPT-7B) is a causal transformer model series optimized for code generation. This implementation provides quantization for reduced memory and compute.
Key characteristics:
- Multi-Query Grouped Attention (MQA)
- Support for KV-caching
- Pre-computed ALiBi attention biases
- Support for 8-bit quantization
References:
Re-exports§
pub use crate::quantized_var_builder::VarBuilder;pub use super::mpt::Config;