Module qwen2_moe

Module qwen2_moe 

Source
Expand description

Qwen2 model implementation with Mixture of Experts support.

Qwen2 is a large language model using sparse Mixture of Experts (MoE). This implementation provides support for sparsely activated MoE layers.

Key characteristics:

  • Mixture of Experts architecture
  • Sparse expert activation
  • Shared expert routing mechanism
  • Grouped query attention (GQA)
  • RMSNorm for layer normalization
  • Rotary positional embeddings (RoPE)

References:

Structsยง

Config
Model