Ollama-integraatio: GPU-inferenssi NVIDIA/AMD/Apple, ei Candle-rajoitteita
- docker-compose: Ollama-container GPU:lla + persistent volume malleille - native-node: Candle poistettu, kutsuu Ollaman HTTP API:a (async) - Dockerfile: yksinkertaistettu, ei CUDA SDK:ta (Ollama hoitaa GPU:n) - Tukee kaikkia malleja: qwen2.5-coder:1.5b/3b/7b/14b/32b - OLLAMA_MODEL ympäristömuuttujalla vaihdetaan malli - kpn models näyttää Ollama-mallit nopeustiedoilla Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -2282,10 +2282,12 @@ Files: ${Object.keys(generatedFiles).join(', ')}`;
|
||||
|
||||
if (sub === 'models') {
|
||||
termLog(' <span style="color:#d29922">Selain (kpn load):</span>', '#c9d1d9');
|
||||
termLog(' qwen-coder Qwen2.5-Coder:0.5B <span style="color:#8b949e">~990 MB | WASM ~0.4 tok/s</span>');
|
||||
termLog(' <span style="color:#3fb950">Natiivi (Docker GPU/CPU):</span>', '#c9d1d9');
|
||||
termLog(' qwen-coder Qwen2.5-Coder:0.5B <span style="color:#8b949e">~990 MB | ~8 tok/s</span>');
|
||||
termLog(' Käyttö: kpn run coder "<prompti>"', '#8b949e');
|
||||
termLog(' qwen-coder:0.5b <span style="color:#8b949e">~990 MB | WASM ~0.4 tok/s</span>');
|
||||
termLog(' <span style="color:#3fb950">Natiivi (Ollama + GPU):</span>', '#c9d1d9');
|
||||
termLog(' qwen2.5-coder:7b <span style="color:#8b949e">~4.7 GB | NVIDIA ~80 tok/s | AMD ~40 tok/s | Apple ~30 tok/s</span>');
|
||||
termLog(' qwen2.5-coder:3b <span style="color:#8b949e">~1.9 GB | NVIDIA ~120 tok/s</span>');
|
||||
termLog(' qwen2.5-coder:1.5b <span style="color:#8b949e">~1 GB | NVIDIA ~150 tok/s</span>');
|
||||
termLog(' Vaihda malli: <span style="color:#58a6ff">OLLAMA_MODEL=qwen2.5-coder:7b</span>', '#8b949e');
|
||||
termLog(' Hub reitittää automaattisesti nopeimmalle solmulle', '#8b949e');
|
||||
return;
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user