Expand description
Microsoft Phi-3 model implementation
See Phi model details at:
The Phi series are decoder-only transformers designed for code and language tasks. Key characteristics:
- Decoder-only transformer architecture
- RoPE embeddings
- Layer normalization
- QK normalization
- Mixed activation functions
- Improved context window handling
References: