Expand description
Mixtral Model, a sparse mixture of expert model based on the Mistral architecture
See Mixtral model details at:
The model uses a mixture of experts architecture with:
- 8 experts per layer
- Top 2 expert routing
- Sliding window attention
- RoPE embeddings
References: