Ollama-integraatio: GPU-inferenssi NVIDIA/AMD/Apple, ei Candle-rajoitteita

- docker-compose: Ollama-container GPU:lla + persistent volume malleille
- native-node: Candle poistettu, kutsuu Ollaman HTTP API:a (async)
- Dockerfile: yksinkertaistettu, ei CUDA SDK:ta (Ollama hoitaa GPU:n)
- Tukee kaikkia malleja: qwen2.5-coder:1.5b/3b/7b/14b/32b
- OLLAMA_MODEL ympäristömuuttujalla vaihdetaan malli
- kpn models näyttää Ollama-mallit nopeustiedoilla

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-07 06:22:11 +03:00
parent d8443792a3
commit 3eb0c4d939
6 changed files with 126 additions and 290 deletions

View File

@@ -2282,10 +2282,12 @@ Files: ${Object.keys(generatedFiles).join(', ')}`;
if (sub === 'models') {
termLog(' <span style="color:#d29922">Selain (kpn load):</span>', '#c9d1d9');
termLog(' qwen-coder Qwen2.5-Coder:0.5B <span style="color:#8b949e">~990 MB | WASM ~0.4 tok/s</span>');
termLog(' <span style="color:#3fb950">Natiivi (Docker GPU/CPU):</span>', '#c9d1d9');
termLog(' qwen-coder Qwen2.5-Coder:0.5B <span style="color:#8b949e">~990 MB | ~8 tok/s</span>');
termLog(' Käyttö: kpn run coder "&lt;prompti&gt;"', '#8b949e');
termLog(' qwen-coder:0.5b <span style="color:#8b949e">~990 MB | WASM ~0.4 tok/s</span>');
termLog(' <span style="color:#3fb950">Natiivi (Ollama + GPU):</span>', '#c9d1d9');
termLog(' qwen2.5-coder:7b <span style="color:#8b949e">~4.7 GB | NVIDIA ~80 tok/s | AMD ~40 tok/s | Apple ~30 tok/s</span>');
termLog(' qwen2.5-coder:3b <span style="color:#8b949e">~1.9 GB | NVIDIA ~120 tok/s</span>');
termLog(' qwen2.5-coder:1.5b <span style="color:#8b949e">~1 GB | NVIDIA ~150 tok/s</span>');
termLog(' Vaihda malli: <span style="color:#58a6ff">OLLAMA_MODEL=qwen2.5-coder:7b</span>', '#8b949e');
termLog(' Hub reitittää automaattisesti nopeimmalle solmulle', '#8b949e');
return;
}