Expand description
Qwen2 model implementation with quantization support.
Qwen2 is a chat-optimized language model that supports 8-bit quantization for reduced memory usage and faster inference.
Key characteristics:
- Group Query Attention (GQA)
- RMSNorm for layer normalization
- Rotary positional embeddings (RoPE)
- Support for 8-bit quantization
References: