Expand description
Qwen2 model implementation with Mixture of Experts support.
Qwen2 is a large language model using sparse Mixture of Experts (MoE). This implementation provides support for sparsely activated MoE layers.
Key characteristics:
- Mixture of Experts architecture
- Sparse expert activation
- Shared expert routing mechanism
- Grouped query attention (GQA)
- RMSNorm for layer normalization
- Rotary positional embeddings (RoPE)
References: