Expand description
Recurrent Gemma model implementation with quantization support.
Gemma is a large language model optimized for efficiency. This implementation provides quantization for reduced memory and compute.
Key characteristics:
- Recurrent blocks with gated recurrent units
- Convolution and attention blocks
- RMSNorm for layer normalization
- Rotary positional embeddings (RoPE)
- Support for 8-bit quantization
References:
Re-exports§
pub use crate::quantized_var_builder::VarBuilder;