Module quantized_qwen2

Module quantized_qwen2 

Source
Expand description

Qwen2 model implementation with quantization support.

Qwen2 is a chat-optimized language model that supports 8-bit quantization for reduced memory usage and faster inference.

Key characteristics:

  • Group Query Attention (GQA)
  • RMSNorm for layer normalization
  • Rotary positional embeddings (RoPE)
  • Support for 8-bit quantization

References:

Structsยง

ModelWeights