Module quantized_llama

Module quantized_llama 

Source
Expand description

Quantized llama model implementation.

This provides a quantized implementation of the llama language model architecture. The model implements parameter efficient quantization for reduced memory usage while maintaining model quality.

Key characteristics:

  • Transformer decoder architecture

  • Support for 2/3/4/8-bit quantization

  • Optimized memory usage through quantization

  • Configurable model sizes and parameter counts

  • 💻 GH Link

  • 📝 Paper

Structs§

ModelWeights

Constants§

MAX_SEQ_LEN