Module mixtral

Module mixtral 

Source
Expand description

Mixtral Model, a sparse mixture of expert model based on the Mistral architecture

See Mixtral model details at:

The model uses a mixture of experts architecture with:

  • 8 experts per layer
  • Top 2 expert routing
  • Sliding window attention
  • RoPE embeddings

References:

Structsยง

Config
https://github.com/huggingface/transformers/blob/1a585c1222a56bcaecc070966d558d4a9d862e83/src/transformers/models/mixtral/configuration_mixtral.py#L113
Model