Module qwen2

Module qwen2 

Source
Expand description

Qwen2 model implementation with quantization support.

Qwen2 is a large language model from Alibaba optimized for efficiency. This implementation provides quantization for reduced memory and compute.

Key characteristics:

  • Streaming decode support
  • Grouped query attention (GQA)
  • RMSNorm for layer normalization
  • Rotary positional embeddings (RoPE)
  • Support for 8-bit quantization

References:

Structs§

Config
Model
ModelForCausalLM