Expand description
Persimmon Model
A transformer language model for efficient inference and general-purpose tasks. The model uses a standard transformer architecture with:
- Layer normalization for Q/K attention
- RoPE embeddings with partial rotary factor
- ReLU activation
- Separate number of attention heads and KV heads
References:
- 💻 Hugging Face Implementation
- 💻 Persimmon Config
- 🤗 Hugging Face