Expand description
Qwen2 model implementation with quantization support.
Qwen2 is a large language model from Alibaba optimized for efficiency. This implementation provides quantization for reduced memory and compute.
Key characteristics:
- Streaming decode support
- Grouped query attention (GQA)
- RMSNorm for layer normalization
- Rotary positional embeddings (RoPE)
- Support for 8-bit quantization
References:
- 🤗 Qwen2 Model