Module quantized_llama2_c

Module quantized_llama2_c 

Source
Expand description

Quantized Llama2 model implementation.

This provides an 8-bit quantized implementation of Meta’s LLaMA2 language model for reduced memory usage and faster inference.

Key characteristics:

  • Decoder-only transformer architecture
  • RoPE position embeddings
  • Grouped Query Attention
  • 8-bit quantization of weights

References:

Re-exports§

pub use crate::quantized_var_builder::VarBuilder;

Structs§

QLlama