Expand description
Gemma LLM architecture (Google) inference implementation.
See “Introducing Gemma 3: The most capable model you can run on a single GPU or TPU”
Based on implementations from HuggingFace transformers.
Gemma LLM architecture (Google) inference implementation.
See “Introducing Gemma 3: The most capable model you can run on a single GPU or TPU”
Based on implementations from HuggingFace transformers.