Expand description
Module containing quantized MixFormer model implementation.
MixFormer is an efficient transformer variant for text generation that uses mixture-of-experts and parallel attention/feed-forward blocks. This implementation provides quantization for reduced memory usage.
Key features:
- Parallel attention and feed-forward computation
- Rotary positional embeddings
- Optional key-value caching
- Support for 8-bit quantization
Re-exports§
pub use crate::quantized_var_builder::VarBuilder;pub use crate::models::mixformer::Config;