hetki ennen webgpu inferenssiä
This commit is contained in:
@@ -41,14 +41,16 @@ Hajautettu AI-laskentaverkko selaimessa ja natiivina. Käyttäjät tarjoavat GPU
|
|||||||
|
|
||||||
- `lib.rs` — Wasm-entrypoint, tehtävävalinta (`SELECTED_TASK`), WebSocket-handler, GPU/CPU-valinta
|
- `lib.rs` — Wasm-entrypoint, tehtävävalinta (`SELECTED_TASK`), WebSocket-handler, GPU/CPU-valinta
|
||||||
- `storage.rs` — IndexedDB read/write (tokenizer, mallin painot)
|
- `storage.rs` — IndexedDB read/write (tokenizer, mallin painot)
|
||||||
|
- `sampling.rs` — Top-k sampling EOS-penaltilla (kiertää Candlen softmax Wasm-bugin)
|
||||||
- `smollm.rs` — SmolLM 135M Candle-inferenssi (Llama-arkkitehtuuri)
|
- `smollm.rs` — SmolLM 135M Candle-inferenssi (Llama-arkkitehtuuri)
|
||||||
- `qwen.rs` — Qwen2.5 0.5B Candle-inferenssi (Qwen2-arkkitehtuuri)
|
- `qwen.rs` — Qwen2.5 0.5B Candle-inferenssi (Qwen2-arkkitehtuuri)
|
||||||
|
- `qwen_coder.rs` — Qwen2.5-Coder 0.5B/3B koodigenerointi (sama arkkitehtuuri, koodikoulutettu)
|
||||||
- `phi3.rs` — Phi-3 placeholder (liian iso selaimelle)
|
- `phi3.rs` — Phi-3 placeholder (liian iso selaimelle)
|
||||||
|
|
||||||
### Native Node (`native-node/src/`)
|
### Native Node (`native-node/src/`)
|
||||||
|
|
||||||
- `main.rs` — GPU-tunnistus (wgpu + NVML + sysfs + Apple), HF Hub -lataus, WS-yhteys
|
- `main.rs` — GPU-tunnistus (wgpu + NVML + sysfs + Apple), HF Hub -lataus, WS-yhteys
|
||||||
- `inference.rs` — Qwen2.5-0.5B Candle-inferenssi, KV-cache reset per prompti, mmap-lataus
|
- `inference.rs` — Qwen2.5-0.5B Candle-inferenssi, CUDA/CPU, KV-cache reset per prompti, mmap-lataus
|
||||||
|
|
||||||
## Kehitysympäristö
|
## Kehitysympäristö
|
||||||
|
|
||||||
@@ -87,6 +89,7 @@ Solmu → hub:
|
|||||||
| `llm_done` | LLM-tulos: `{response, tokens_generated, tokens_per_sec}` |
|
| `llm_done` | LLM-tulos: `{response, tokens_generated, tokens_per_sec}` |
|
||||||
| `llm_chunk` | Streaming-token |
|
| `llm_chunk` | Streaming-token |
|
||||||
| `download_progress` | Mallin latauksen edistyminen |
|
| `download_progress` | Mallin latauksen edistyminen |
|
||||||
|
| `user_text` | Käyttäjän oma teksti: `{text, task_type}` |
|
||||||
|
|
||||||
## API-endpointit
|
## API-endpointit
|
||||||
|
|
||||||
@@ -105,6 +108,7 @@ Solmu → hub:
|
|||||||
- **IP-rajoitus** — max 4 WS-yhteyttä per IP, X-Forwarded-For -tuki
|
- **IP-rajoitus** — max 4 WS-yhteyttä per IP, X-Forwarded-For -tuki
|
||||||
- **Viestivalidointi** — pakollinen `type`, sallitut tyypit, kenttäkohtaiset rajat
|
- **Viestivalidointi** — pakollinen `type`, sallitut tyypit, kenttäkohtaiset rajat
|
||||||
- **Viestikoko** — max 16 KB per WebSocket-viesti
|
- **Viestikoko** — max 16 KB per WebSocket-viesti
|
||||||
|
- **Admin Basic Auth** — `/admin` ja `/api/*` salasanan takana (`ADMIN_PASSWORD` env, oletus: `kipina`)
|
||||||
- **Caddy** — automaattinen TLS (Let's Encrypt)
|
- **Caddy** — automaattinen TLS (Let's Encrypt)
|
||||||
|
|
||||||
## Tuotanto-deploy
|
## Tuotanto-deploy
|
||||||
@@ -119,10 +123,11 @@ docker compose -f docker-compose.prod.yml down && docker compose -f docker-compo
|
|||||||
|
|
||||||
## Tiedossa olevat rajoitukset
|
## Tiedossa olevat rajoitukset
|
||||||
|
|
||||||
- LLM-inferenssi on **greedy** (argmax) — ei temperature/top-p samplingia Wasmissa (Candlen `SoftmaxLastDim` bugi)
|
- LLM-inferenssi käyttää **top-k samplingia** (k=10, EOS-penaltti) — ei täyttä temperature/top-p -tukea Wasmissa
|
||||||
- Qwen selaimessa: ~0.4 tok/s CPU — käyttökelpoinen demona mutta ei tuotantoon
|
- Qwen selaimessa: ~0.4 tok/s CPU — käyttökelpoinen demona mutta ei tuotantoon
|
||||||
|
- Native node + CUDA: ~50-100 tok/s (RTX 4090)
|
||||||
- Hub broadcastaa kaikki viestit kaikille — ei kohdennettu reititystä
|
- Hub broadcastaa kaikki viestit kaikille — ei kohdennettu reititystä
|
||||||
- CUDA-tuki vaatii `nvidia-cuda-toolkit` asennuksen + Cargo.toml featuren
|
- 3B Coder-malli vaatii ~12 GB RAM selaimessa (Wasm)
|
||||||
|
|
||||||
## Lisenssi
|
## Lisenssi
|
||||||
|
|
||||||
|
|||||||
@@ -15,20 +15,29 @@ Kipinä Agentic Network on hajautettu tekoälylaskentaverkko, jossa selaimet ja
|
|||||||
jos WebGPU ei tuettu
|
jos WebGPU ei tuettu
|
||||||
```
|
```
|
||||||
|
|
||||||
**Hub** jakaa tokenisointitehtäviä satunnaisesti 10 sekunnin välein. Solmut tokenisoivat syötteen Qwen2.5-Coder-tokenizerin avulla ja palauttavat tuloksen. Hub näyttää tulokset terminaalissa ja välittää ne dashboardiin.
|
**Hub** jakaa tehtäviä (tokenisointiparit, LLM-promptit, kooditehtävät) 10 sekunnin välein. Solmut käsittelevät vain valitsemansa tehtävätyypin mukaisia viestejä.
|
||||||
|
|
||||||
## Kaksi tapaa osallistua verkkoon
|
## Kolme tapaa osallistua verkkoon
|
||||||
|
|
||||||
### 1. Selainsolmu (Wasm + WebGPU)
|
### 1. Selainsolmu — Laskentaverkko
|
||||||
- Avaa `http://localhost:3000` | `https://kipina.studio` selaimessa ja klikkaa "Liity laskentaverkkoon"
|
- Avaa `http://localhost:3000` | `https://kipina.studio` ja valitse tehtävä:
|
||||||
- Selain tunnistaa automaattisesti WebGPU-tuen — jos ei löydy, käytetään CPU-fallbackia
|
- **Tokenisointivertailu** — EN/FI-kieliparien BPE-tokenisointitehokkuus (~7 MB lataus)
|
||||||
- Tokenizer ladataan HuggingFacesta ensimmäisellä kerralla ja tallennetaan IndexedDB:hen
|
- **SmolLM 135M** — kevyt LLM-inferenssi (~269 MB, ~1.2 tok/s)
|
||||||
- GPU-kuormitusta voi säätää sliderilla (0–75 %)
|
- **Qwen2.5 0.5B** — tehokkaampi LLM (~990 MB, ~0.4 tok/s)
|
||||||
|
- **Phi-3 Mini 3.8B** — vain native-nodella
|
||||||
|
- WebGPU tunnistetaan automaattisesti, CPU-fallback jos ei tuettu
|
||||||
|
- Mallit ja tokenizerit cachetetaan IndexedDB:hen
|
||||||
|
|
||||||
### 2. Natiivi-node (Rust + NVML)
|
### 2. Selainsolmu — Koodilaboratorio
|
||||||
|
- Erillinen välilehti: **Qwen2.5-Coder** koodigenerointi
|
||||||
|
- Valittavissa **0.5B** (nopea) tai **3B** (laadukas, 6.2 GB lataus)
|
||||||
|
- Oma promptti: kirjoita Python-ohjelmointitehtävä ja paina "Generate"
|
||||||
|
- Syntaksikorostettu koodivastaus
|
||||||
|
|
||||||
|
### 3. Natiivi-node (Rust + CUDA/CPU)
|
||||||
|
- Qwen2.5-0.5B-Instruct inferenssi CUDA:lla (~50-100 tok/s RTX 4090) tai CPU:lla (~11 tok/s)
|
||||||
- Kerää nvidia-smi-tason laitteistotiedot: GPU-nimi, VRAM, lämpötila, kuormitus
|
- Kerää nvidia-smi-tason laitteistotiedot: GPU-nimi, VRAM, lämpötila, kuormitus
|
||||||
- Raportoi järjestelmätiedot: CPU-malli, ytimet, RAM, OS
|
- Lataa mallin automaattisesti HuggingFace Hubista (~990 MB, cachetetaan)
|
||||||
- Yhdistää hubiin ja vastaanottaa tehtäviä
|
|
||||||
|
|
||||||
## Käynnistys
|
## Käynnistys
|
||||||
|
|
||||||
@@ -65,23 +74,26 @@ CARGO_TARGET_DIR=target-native HUB_URL=ws://localhost:3000/ws ALLOCATED_GB=4 car
|
|||||||
CARGO_TARGET_DIR=target-native HUB_URL=wss://kipina.studio/ws ALLOCATED_GB=4 cargo run --release -p native-node
|
CARGO_TARGET_DIR=target-native HUB_URL=wss://kipina.studio/ws ALLOCATED_GB=4 cargo run --release -p native-node
|
||||||
```
|
```
|
||||||
|
|
||||||
### CUDA-tuki (valinnainen)
|
### CUDA-tuki
|
||||||
|
|
||||||
Jos koneessa on NVIDIA GPU ja CUDA toolkit:
|
CUDA on oletuksena päällä native-nodessa. Vaatii `nvidia-cuda-toolkit`:n:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Asenna CUDA toolkit (Ubuntu/Pop!_OS)
|
# Asenna (Ubuntu/Pop!_OS)
|
||||||
sudo apt install nvidia-cuda-toolkit
|
sudo apt install nvidia-cuda-toolkit
|
||||||
|
|
||||||
# Muokkaa native-node/Cargo.toml:
|
# Tarkista
|
||||||
# candle-core = { version = "0.8", features = ["cuda"] }
|
nvcc --version
|
||||||
|
|
||||||
# Aja — malli käyttää automaattisesti GPU:ta
|
# Aja — tunnistaa CUDA:n automaattisesti, fallback CPU:lle
|
||||||
CARGO_TARGET_DIR=target-native HUB_URL=ws://localhost:3000/ws cargo run --release -p native-node
|
CARGO_TARGET_DIR=target-native HUB_URL=ws://localhost:3000/ws cargo run --release -p native-node
|
||||||
|
|
||||||
CARGO_TARGET_DIR=target-native HUB_URL=ws://kipina.studio/ws cargo run --release -p native-node
|
# Tuotantoon
|
||||||
|
CARGO_TARGET_DIR=target-native HUB_URL=wss://kipina.studio/ws cargo run --release -p native-node
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Jos CUDA:a ei ole, poista feature: `candle-core = { version = "0.8" }` (ilman `features = ["cuda"]`).
|
||||||
|
|
||||||
## WebGPU-asetukset selaimessa
|
## WebGPU-asetukset selaimessa
|
||||||
|
|
||||||
WebGPU ei ole oletuksena päällä kaikissa selaimissa. Jos "Liity laskentaverkkoon" -nappi käynnistää CPU-fallbackin vaikka koneessa on näytönohjain:
|
WebGPU ei ole oletuksena päällä kaikissa selaimissa. Jos "Liity laskentaverkkoon" -nappi käynnistää CPU-fallbackin vaikka koneessa on näytönohjain:
|
||||||
@@ -114,19 +126,31 @@ flatpak run com.brave.Browser --enable-unsafe-webgpu --enable-features=Vulkan --
|
|||||||
```
|
```
|
||||||
network-poc/
|
network-poc/
|
||||||
├── hub/ # Keskuspalvelin (Rust + Axum)
|
├── hub/ # Keskuspalvelin (Rust + Axum)
|
||||||
│ └── src/main.rs # WebSocket-reititin, tehtävien jakelu, statistiikat
|
│ └── src/
|
||||||
|
│ ├── main.rs # WebSocket-reititin, tehtävien jakelu, admin HTML, Basic Auth
|
||||||
|
│ └── db.rs # SQLite: node_sessions, pair_results
|
||||||
├── node/ # Selainsolmu (Rust → Wasm)
|
├── node/ # Selainsolmu (Rust → Wasm)
|
||||||
│ └── src/
|
│ └── src/
|
||||||
│ ├── lib.rs # WebGPU/NdArray-laskenta, tokenisaatio, WS-yhteys
|
│ ├── lib.rs # Wasm-entrypoint, tehtävävalinta, WS-handler
|
||||||
│ └── storage.rs # IndexedDB-välimuisti (tokenizer)
|
│ ├── storage.rs # IndexedDB-välimuisti
|
||||||
├── native-node/ # Natiivi-solmu (Rust)
|
│ ├── sampling.rs # Top-k sampling (EOS-penaltti)
|
||||||
│ └── src/main.rs # NVML GPU-tunnistus, sysinfo, WS-yhteys
|
│ ├── smollm.rs # SmolLM 135M inferenssi
|
||||||
|
│ ├── qwen.rs # Qwen2.5 0.5B inferenssi
|
||||||
|
│ ├── qwen_coder.rs # Qwen2.5-Coder 0.5B/3B koodigenerointi
|
||||||
|
│ └── phi3.rs # Phi-3 placeholder
|
||||||
|
├── native-node/ # Natiivi-solmu (Rust + CUDA)
|
||||||
|
│ └── src/
|
||||||
|
│ ├── main.rs # GPU-tunnistus, WS-yhteys, tehtäväkäsittely
|
||||||
|
│ └── inference.rs # Qwen2.5-0.5B Candle-inferenssi (CUDA/CPU)
|
||||||
├── static/
|
├── static/
|
||||||
│ ├── index.html # Dashboard-käyttöliittymä
|
│ ├── index.html # Dashboard + Koodilaboratorio
|
||||||
│ └── pkg/ # Wasm-build (generoidaan)
|
│ └── pkg/ # Wasm-build (generoidaan)
|
||||||
├── docker-compose.yml
|
├── deploy.sh # Lokaali build → palvelimelle
|
||||||
├── Dockerfile.dev # Hub + Wasm-build
|
├── docker-compose.yml # Kehitys
|
||||||
└── Dockerfile.native-node
|
├── docker-compose.prod.yml # Tuotanto (Caddy + Hub)
|
||||||
|
├── docker-compose.client.yml # Client-nodejen Docker
|
||||||
|
├── Dockerfile.prod # Tuotanto-image (cache mount)
|
||||||
|
└── Caddyfile.prod # TLS + reverse proxy
|
||||||
```
|
```
|
||||||
|
|
||||||
## Ympäristömuuttujat
|
## Ympäristömuuttujat
|
||||||
@@ -135,15 +159,27 @@ network-poc/
|
|||||||
|---|---|---|
|
|---|---|---|
|
||||||
| `HUB_URL` | `ws://hub:3000/ws` | Hub-palvelimen WebSocket-osoite (native-node) |
|
| `HUB_URL` | `ws://hub:3000/ws` | Hub-palvelimen WebSocket-osoite (native-node) |
|
||||||
| `ALLOCATED_GB` | `4` | Solmun varaama muisti verkosta (GB) |
|
| `ALLOCATED_GB` | `4` | Solmun varaama muisti verkosta (GB) |
|
||||||
|
| `ADMIN_PASSWORD` | `kipina` | Admin-sivun ja API:n salasana (Basic Auth) |
|
||||||
|
| `DATABASE_PATH` | `nodes.db` | SQLite-tietokannan polku |
|
||||||
|
| `STATIC_DIR` | `../static` | Staattisten tiedostojen kansio |
|
||||||
|
|
||||||
## Kehitysvaihe
|
## Admin-sivu
|
||||||
|
|
||||||
Tämä on proof-of-concept. Toimivat osat:
|
`https://kipina.studio/admin` (Basic Auth, salasana: `ADMIN_PASSWORD`)
|
||||||
- Hub-palvelin, WebSocket-viestintä, dashboard
|
|
||||||
- WebGPU-tensorilaskenta selaimessa (Burn + Wgpu)
|
|
||||||
- CPU-fallback selaimissa ilman WebGPU-tukea (Burn + NdArray)
|
|
||||||
- Natiivi-node nvidia-smi-tason laitteistotiedoilla
|
|
||||||
- Qwen2.5-Coder-tokenizer + IndexedDB-välimuisti
|
|
||||||
- GPU-kuormituksen säätö (duty cycle throttling)
|
|
||||||
|
|
||||||
Seuraavaksi: oikea LLM-inferenssi hajautetusti (mallin painojen lataus, transformer-arkkitehtuuri Wasm/WebGPU:lla).
|
Sisältää:
|
||||||
|
- Node-sessiot: IP, laitetiedot, GPU, WebGPU-tuki, tehtävätyyppi, uptime
|
||||||
|
- Tokenisointitulokset: EN/FI-vertailut, ylikustannus-%
|
||||||
|
- Yhteenvetotilastot: sessiot, WebGPU vs CPU, keskiarvot
|
||||||
|
|
||||||
|
## Projektin tila
|
||||||
|
|
||||||
|
Toimivat ominaisuudet:
|
||||||
|
- Tokenisointivertailu (EN/FI, BPE, top-k sampling)
|
||||||
|
- SmolLM 135M inferenssi selaimessa (Candle + Wasm)
|
||||||
|
- Qwen2.5 0.5B inferenssi selaimessa (Candle + Wasm)
|
||||||
|
- Qwen2.5-Coder 0.5B/3B koodigenerointi (Koodilaboratorio-välilehti)
|
||||||
|
- Native node + CUDA (RTX 4090: ~50-100 tok/s)
|
||||||
|
- Admin-dashboard + SQLite + Basic Auth
|
||||||
|
- Deploy-skripti (lokaali build → palvelin)
|
||||||
|
- WebGPU + CPU fallback, GPU-tunnistus (NVIDIA/AMD/Apple)
|
||||||
|
|||||||
@@ -157,13 +157,30 @@ async function load() {
|
|||||||
{v: stats.avg_overhead_pct + '%', l: 'FI ylikust. (ka.)'},
|
{v: stats.avg_overhead_pct + '%', l: 'FI ylikust. (ka.)'},
|
||||||
].map(s => `<div class="stat-card"><div class="val">${s.v}</div><div class="label">${s.l}</div></div>`).join('');
|
].map(s => `<div class="stat-card"><div class="val">${s.v}</div><div class="label">${s.l}</div></div>`).join('');
|
||||||
|
|
||||||
// Sessions
|
// Sessions — lajittelu: 1) aktiiviset nodet (online + ei viewer), 2) katsojat (online + viewer), 3) offline
|
||||||
|
const taskNames = {'tokenize':'Tokenisaatio','smollm-135m':'SmolLM 135M','qwen-05b':'Qwen2.5 0.5B','phi3-mini':'Phi-3 Mini','qwen-coder-05b':'Coder 0.5B','qwen-coder-3b':'Coder 3B','viewer':'Katsoja'};
|
||||||
|
sessions.sort((a, b) => {
|
||||||
|
const aOnline = !a.disconnected_at;
|
||||||
|
const bOnline = !b.disconnected_at;
|
||||||
|
const aViewer = a.selected_task === 'viewer';
|
||||||
|
const bViewer = b.selected_task === 'viewer';
|
||||||
|
// Online ennen offlinea
|
||||||
|
if (aOnline !== bOnline) return aOnline ? -1 : 1;
|
||||||
|
// Online: aktiiviset nodet ennen katsojia
|
||||||
|
if (aOnline && bOnline && aViewer !== bViewer) return aViewer ? 1 : -1;
|
||||||
|
// Saman ryhmän sisällä: uusin ensin
|
||||||
|
return new Date(b.connected_at) - new Date(a.connected_at);
|
||||||
|
});
|
||||||
|
|
||||||
document.getElementById('sessions-body').innerHTML = sessions.map(s => {
|
document.getElementById('sessions-body').innerHTML = sessions.map(s => {
|
||||||
const online = !s.disconnected_at;
|
const online = !s.disconnected_at;
|
||||||
const status = online ? '<span class="online">ONLINE</span>' : '<span class="offline">offline</span>';
|
const isViewer = s.selected_task === 'viewer';
|
||||||
|
const status = online
|
||||||
|
? (isViewer ? '<span style="color:#d29922">CONNECTED</span>' : '<span class="online">ACTIVE</span>')
|
||||||
|
: '<span class="offline">offline</span>';
|
||||||
const typeBadge = s.node_type === 'native' ? badge('native','blue') : badge('browser','yellow');
|
const typeBadge = s.node_type === 'native' ? badge('native','blue') : badge('browser','yellow');
|
||||||
const taskNames = {'tokenize':'Tokenisaatio','smollm-135m':'SmolLM 135M','qwen-05b':'Qwen2.5 0.5B','phi3-mini':'Phi-3 Mini'};
|
const taskColor = isViewer ? 'yellow' : s.selected_task === 'tokenize' ? 'green' : 'blue';
|
||||||
const taskBadge = badge(taskNames[s.selected_task] || s.selected_task || 'tokenize', s.selected_task === 'tokenize' ? 'green' : 'blue');
|
const taskBadge = badge(taskNames[s.selected_task] || s.selected_task || '?', taskColor);
|
||||||
const gpuBadge = s.has_webgpu ? badge('WebGPU','green') : badge('CPU','red');
|
const gpuBadge = s.has_webgpu ? badge('WebGPU','green') : badge('CPU','red');
|
||||||
const gpu = s.gpu_name ? `${s.gpu_name}` : '-';
|
const gpu = s.gpu_name ? `${s.gpu_name}` : '-';
|
||||||
const vram = s.vram_total_mb ? `${s.vram_total_mb} MB` : '-';
|
const vram = s.vram_total_mb ? `${s.vram_total_mb} MB` : '-';
|
||||||
@@ -346,27 +363,73 @@ async fn main() {
|
|||||||
}
|
}
|
||||||
|
|
||||||
async fn api_sessions(
|
async fn api_sessions(
|
||||||
|
headers: axum::http::HeaderMap,
|
||||||
axum::extract::State(state): axum::extract::State<Arc<AppState>>,
|
axum::extract::State(state): axum::extract::State<Arc<AppState>>,
|
||||||
) -> impl IntoResponse {
|
) -> axum::response::Response {
|
||||||
axum::Json(state.db.get_sessions(200))
|
if !check_admin_auth(&headers) { return admin_unauthorized(); }
|
||||||
|
axum::Json(state.db.get_sessions(200)).into_response()
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn api_pairs(
|
async fn api_pairs(
|
||||||
|
headers: axum::http::HeaderMap,
|
||||||
axum::extract::State(state): axum::extract::State<Arc<AppState>>,
|
axum::extract::State(state): axum::extract::State<Arc<AppState>>,
|
||||||
) -> impl IntoResponse {
|
) -> axum::response::Response {
|
||||||
axum::Json(state.db.get_pair_results(500))
|
if !check_admin_auth(&headers) { return admin_unauthorized(); }
|
||||||
|
axum::Json(state.db.get_pair_results(500)).into_response()
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn api_stats(
|
async fn api_stats(
|
||||||
|
headers: axum::http::HeaderMap,
|
||||||
axum::extract::State(state): axum::extract::State<Arc<AppState>>,
|
axum::extract::State(state): axum::extract::State<Arc<AppState>>,
|
||||||
) -> impl IntoResponse {
|
) -> axum::response::Response {
|
||||||
|
if !check_admin_auth(&headers) { return admin_unauthorized(); }
|
||||||
let mut stats = state.db.get_stats();
|
let mut stats = state.db.get_stats();
|
||||||
stats.as_object_mut().unwrap().insert("version".to_string(), serde_json::json!(env!("CARGO_PKG_VERSION")));
|
stats.as_object_mut().unwrap().insert("version".to_string(), serde_json::json!(env!("CARGO_PKG_VERSION")));
|
||||||
axum::Json(stats)
|
axum::Json(stats).into_response()
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn admin_page() -> impl IntoResponse {
|
fn check_admin_auth(headers: &axum::http::HeaderMap) -> bool {
|
||||||
axum::response::Html(ADMIN_HTML)
|
let password = std::env::var("ADMIN_PASSWORD").unwrap_or_else(|_| "kipina".to_string());
|
||||||
|
if let Some(auth) = headers.get("authorization").and_then(|v| v.to_str().ok()) {
|
||||||
|
if auth.starts_with("Basic ") {
|
||||||
|
if let Ok(decoded) = String::from_utf8(
|
||||||
|
base64_decode(auth.trim_start_matches("Basic ").trim())
|
||||||
|
) {
|
||||||
|
// Tarkistetaan "user:password" — käyttäjänimi ei väliä
|
||||||
|
if let Some(pass) = decoded.split(':').nth(1) {
|
||||||
|
return pass == password;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
false
|
||||||
|
}
|
||||||
|
|
||||||
|
fn base64_decode(input: &str) -> Vec<u8> {
|
||||||
|
// Yksinkertainen base64-dekooderi
|
||||||
|
const TABLE: &[u8; 64] = b"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
|
||||||
|
let mut out = Vec::new();
|
||||||
|
let bytes: Vec<u8> = input.bytes().filter(|&b| b != b'=').collect();
|
||||||
|
for chunk in bytes.chunks(4) {
|
||||||
|
let vals: Vec<u8> = chunk.iter().filter_map(|&b| TABLE.iter().position(|&t| t == b).map(|p| p as u8)).collect();
|
||||||
|
if vals.len() >= 2 { out.push((vals[0] << 2) | (vals[1] >> 4)); }
|
||||||
|
if vals.len() >= 3 { out.push((vals[1] << 4) | (vals[2] >> 2)); }
|
||||||
|
if vals.len() >= 4 { out.push((vals[2] << 6) | vals[3]); }
|
||||||
|
}
|
||||||
|
out
|
||||||
|
}
|
||||||
|
|
||||||
|
fn admin_unauthorized() -> axum::response::Response {
|
||||||
|
axum::response::Response::builder()
|
||||||
|
.status(401)
|
||||||
|
.header("WWW-Authenticate", "Basic realm=\"Kipinä Admin\"")
|
||||||
|
.body(axum::body::Body::from("Unauthorized"))
|
||||||
|
.unwrap()
|
||||||
|
}
|
||||||
|
|
||||||
|
async fn admin_page(headers: axum::http::HeaderMap) -> axum::response::Response {
|
||||||
|
if !check_admin_auth(&headers) { return admin_unauthorized(); }
|
||||||
|
axum::response::Html(ADMIN_HTML).into_response()
|
||||||
}
|
}
|
||||||
|
|
||||||
async fn ws_handler(
|
async fn ws_handler(
|
||||||
|
|||||||
@@ -12,7 +12,7 @@ serde_json = "1.0"
|
|||||||
sysinfo = "0.30"
|
sysinfo = "0.30"
|
||||||
nvml-wrapper = "0.10"
|
nvml-wrapper = "0.10"
|
||||||
wgpu = "24"
|
wgpu = "24"
|
||||||
candle-core = { version = "0.8" }
|
candle-core = { version = "0.8", features = ["cuda"] }
|
||||||
candle-nn = "0.8"
|
candle-nn = "0.8"
|
||||||
candle-transformers = "0.8"
|
candle-transformers = "0.8"
|
||||||
hf-hub = "0.4"
|
hf-hub = "0.4"
|
||||||
|
|||||||
@@ -7,6 +7,7 @@ use burn::tensor::Tensor;
|
|||||||
use burn::backend::{Wgpu, NdArray};
|
use burn::backend::{Wgpu, NdArray};
|
||||||
|
|
||||||
pub mod storage;
|
pub mod storage;
|
||||||
|
pub mod sampling;
|
||||||
pub mod smollm;
|
pub mod smollm;
|
||||||
pub mod qwen;
|
pub mod qwen;
|
||||||
pub mod qwen_coder;
|
pub mod qwen_coder;
|
||||||
|
|||||||
@@ -154,7 +154,7 @@ pub async fn run_qwen_inference(prompt: String, ws: Rc<RefCell<WebSocket>>) {
|
|||||||
} else {
|
} else {
|
||||||
logits // jo [vocab_size]
|
logits // jo [vocab_size]
|
||||||
};
|
};
|
||||||
let mut next_token = logits.argmax(0).unwrap().to_vec0::<u32>().unwrap();
|
let mut next_token = crate::sampling::sample_top_k(&logits, 10, 5.0);
|
||||||
console_log!("[Qwen] Ensimmäinen token: {}", next_token);
|
console_log!("[Qwen] Ensimmäinen token: {}", next_token);
|
||||||
|
|
||||||
let eos_token = 151645u32; // <|endoftext|> for Qwen2.5
|
let eos_token = 151645u32; // <|endoftext|> for Qwen2.5
|
||||||
@@ -188,7 +188,7 @@ pub async fn run_qwen_inference(prompt: String, ws: Rc<RefCell<WebSocket>>) {
|
|||||||
} else {
|
} else {
|
||||||
logits
|
logits
|
||||||
};
|
};
|
||||||
next_token = logits.argmax(0).unwrap().to_vec0::<u32>().unwrap();
|
next_token = crate::sampling::sample_top_k(&logits, 10, 5.0);
|
||||||
pos += 1;
|
pos += 1;
|
||||||
|
|
||||||
if next_token == eos_token { break; }
|
if next_token == eos_token { break; }
|
||||||
|
|||||||
@@ -173,8 +173,22 @@ pub async fn run_coder_inference(prompt: String, ws: Rc<RefCell<WebSocket>>, use
|
|||||||
let load_time = perf.now() - start_load;
|
let load_time = perf.now() - start_load;
|
||||||
console_log!("[Coder] Malli ladattu ({:.0}ms). Generoidaan...", load_time);
|
console_log!("[Coder] Malli ladattu ({:.0}ms). Generoidaan...", load_time);
|
||||||
|
|
||||||
// Muotoillaan chat-template
|
// Parsitaan JSON-prompti tai käytetään teksti sellaisenaan
|
||||||
let formatted = format!("<|im_start|>system\nYou are a Python coding assistant. Write only code, no explanations.<|im_end|>\n<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n", prompt);
|
let (actual_prompt, system_msg, max_new_tokens) = if prompt.starts_with('{') {
|
||||||
|
if let Ok(json) = serde_json::from_str::<serde_json::Value>(&prompt) {
|
||||||
|
let p = json.get("prompt").and_then(|v| v.as_str()).unwrap_or(&prompt).to_string();
|
||||||
|
let s = json.get("system").and_then(|v| v.as_str())
|
||||||
|
.unwrap_or("You are a Python coding assistant. Write only code, no explanations.").to_string();
|
||||||
|
let m = json.get("max_tokens").and_then(|v| v.as_u64()).unwrap_or(128) as usize;
|
||||||
|
(p, s, m)
|
||||||
|
} else {
|
||||||
|
(prompt.clone(), "You are a Python coding assistant. Write only code, no explanations.".to_string(), 128)
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
(prompt.clone(), "You are a Python coding assistant. Write only code, no explanations.".to_string(), 128)
|
||||||
|
};
|
||||||
|
|
||||||
|
let formatted = format!("<|im_start|>system\n{}<|im_end|>\n<|im_start|>user\n{}<|im_end|>\n<|im_start|>assistant\n", system_msg, actual_prompt);
|
||||||
|
|
||||||
let encoding = match tokenizer.encode(formatted.as_str(), true) {
|
let encoding = match tokenizer.encode(formatted.as_str(), true) {
|
||||||
Ok(e) => e,
|
Ok(e) => e,
|
||||||
@@ -185,7 +199,7 @@ pub async fn run_coder_inference(prompt: String, ws: Rc<RefCell<WebSocket>>, use
|
|||||||
console_log!("[Coder] Syöte: {} tokenia", input_len);
|
console_log!("[Coder] Syöte: {} tokenia", input_len);
|
||||||
|
|
||||||
let start_gen = perf.now();
|
let start_gen = perf.now();
|
||||||
let max_new_tokens = 128; // Koodille enemmän tokeneita
|
// max_new_tokens tulee JSON-promptista tai oletuksena 128
|
||||||
let mut generated_text = String::new();
|
let mut generated_text = String::new();
|
||||||
let mut tokens_generated: usize = 0;
|
let mut tokens_generated: usize = 0;
|
||||||
let eos_token = 151645u32;
|
let eos_token = 151645u32;
|
||||||
@@ -206,7 +220,7 @@ pub async fn run_coder_inference(prompt: String, ws: Rc<RefCell<WebSocket>>, use
|
|||||||
} else {
|
} else {
|
||||||
logits
|
logits
|
||||||
};
|
};
|
||||||
let mut next_token = logits.argmax(0).unwrap().to_vec0::<u32>().unwrap();
|
let mut next_token = crate::sampling::sample_top_k(&logits, 10, 5.0);
|
||||||
|
|
||||||
if next_token != eos_token {
|
if next_token != eos_token {
|
||||||
if let Ok(text) = tokenizer.decode(&[next_token], true) {
|
if let Ok(text) = tokenizer.decode(&[next_token], true) {
|
||||||
@@ -237,7 +251,7 @@ pub async fn run_coder_inference(prompt: String, ws: Rc<RefCell<WebSocket>>, use
|
|||||||
} else {
|
} else {
|
||||||
logits
|
logits
|
||||||
};
|
};
|
||||||
next_token = logits.argmax(0).unwrap().to_vec0::<u32>().unwrap();
|
next_token = crate::sampling::sample_top_k(&logits, 10, 5.0);
|
||||||
pos += 1;
|
pos += 1;
|
||||||
|
|
||||||
if next_token == eos_token { break; }
|
if next_token == eos_token { break; }
|
||||||
|
|||||||
47
network-poc/node/src/sampling.rs
Normal file
47
network-poc/node/src/sampling.rs
Normal file
@@ -0,0 +1,47 @@
|
|||||||
|
use candle_core::Tensor;
|
||||||
|
|
||||||
|
/// Top-k sampling ilman softmaxia — kiertää Candlen SoftmaxLastDim Wasm-bugin.
|
||||||
|
/// Valitsee top-k logiteista ja poimii satunnaisen (painotettu).
|
||||||
|
/// Jos k=1, toimii kuten argmax (greedy).
|
||||||
|
pub fn sample_top_k(logits: &Tensor, k: usize, eos_penalty: f32) -> u32 {
|
||||||
|
// Muunnetaan Vec<f32>:ksi
|
||||||
|
let logits_vec: Vec<f32> = logits.to_vec1::<f32>().unwrap_or_default();
|
||||||
|
if logits_vec.is_empty() { return 0; }
|
||||||
|
|
||||||
|
// Rangotaan ja otetaan top-k indeksit
|
||||||
|
let mut indexed: Vec<(usize, f32)> = logits_vec.iter().enumerate().map(|(i, &v)| (i, v)).collect();
|
||||||
|
indexed.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
|
||||||
|
indexed.truncate(k);
|
||||||
|
|
||||||
|
// EOS-penaltti: vähennetään EOS-tokenin logitia
|
||||||
|
for item in indexed.iter_mut() {
|
||||||
|
if item.0 == 2 || item.0 == 151645 { // SmolLM EOS=2, Qwen EOS=151645
|
||||||
|
item.1 -= eos_penalty;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if k == 1 {
|
||||||
|
return indexed[0].0 as u32;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Yksinkertainen "softmax" top-k:lle CPU:lla
|
||||||
|
let max_logit = indexed.iter().map(|x| x.1).fold(f32::NEG_INFINITY, f32::max);
|
||||||
|
let exps: Vec<f32> = indexed.iter().map(|x| (x.1 - max_logit).exp()).collect();
|
||||||
|
let sum: f32 = exps.iter().sum();
|
||||||
|
let probs: Vec<f32> = exps.iter().map(|e| e / sum).collect();
|
||||||
|
|
||||||
|
// Satunnainen valinta kumulatiivisella todennäköisyydellä
|
||||||
|
// Käytetään yksinkertaista XorShift-satunnaislukugeneraattoria (ei tarvita getrandom)
|
||||||
|
let seed = (js_sys::Date::now() * 1000.0) as u64;
|
||||||
|
let rand_val = ((seed ^ (seed >> 13) ^ (seed << 7)) % 10000) as f32 / 10000.0;
|
||||||
|
|
||||||
|
let mut cumulative = 0.0;
|
||||||
|
for (i, p) in probs.iter().enumerate() {
|
||||||
|
cumulative += p;
|
||||||
|
if rand_val < cumulative {
|
||||||
|
return indexed[i].0 as u32;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
indexed[0].0 as u32
|
||||||
|
}
|
||||||
@@ -196,7 +196,7 @@ pub async fn run_smollm_inference(prompt: String, ws: Rc<RefCell<WebSocket>>) {
|
|||||||
} else {
|
} else {
|
||||||
logits
|
logits
|
||||||
};
|
};
|
||||||
let mut next_token = logits.argmax(0).unwrap().to_vec0::<u32>().unwrap();
|
let mut next_token = crate::sampling::sample_top_k(&logits, 10, 5.0);
|
||||||
console_log!("[SmolLM] Ensimmäinen generoitu token: {}", next_token);
|
console_log!("[SmolLM] Ensimmäinen generoitu token: {}", next_token);
|
||||||
pos = input_len;
|
pos = input_len;
|
||||||
|
|
||||||
@@ -229,7 +229,7 @@ pub async fn run_smollm_inference(prompt: String, ws: Rc<RefCell<WebSocket>>) {
|
|||||||
} else {
|
} else {
|
||||||
logits
|
logits
|
||||||
};
|
};
|
||||||
next_token = logits.argmax(0).unwrap().to_vec0::<u32>().unwrap();
|
next_token = crate::sampling::sample_top_k(&logits, 10, 5.0);
|
||||||
pos += 1;
|
pos += 1;
|
||||||
|
|
||||||
if next_token == 2 { break; }
|
if next_token == 2 { break; }
|
||||||
|
|||||||
@@ -567,9 +567,37 @@
|
|||||||
</div>
|
</div>
|
||||||
</label>
|
</label>
|
||||||
</div>
|
</div>
|
||||||
<div style="display:flex;gap:8px">
|
<div style="display:flex;gap:8px;align-items:start">
|
||||||
<input type="text" id="code-input" placeholder="e.g. Write a Python function that checks if a number is prime" style="flex:1;background:var(--panel-bg);border:1px solid var(--border-color);border-radius:4px;padding:8px 12px;color:var(--text-color);font-size:14px;outline:none">
|
<div style="flex:1">
|
||||||
<button id="code-send-btn" style="background:#238636;color:#fff;border:1px solid rgba(240,246,252,0.1);border-radius:4px;padding:8px 16px;font-size:14px;cursor:pointer">Generate</button>
|
<div style="display:flex;gap:8px;margin-bottom:4px">
|
||||||
|
<input type="text" id="code-input" placeholder='e.g. Write a Python function that checks if a number is prime' style="flex:1;background:var(--panel-bg);border:1px solid var(--border-color);border-radius:4px;padding:8px 12px;color:var(--text-color);font-size:14px;outline:none;display:block" >
|
||||||
|
<textarea id="code-input-json" placeholder='{"prompt":"Write a fibonacci function","system":"You are a Python expert","max_tokens":128}' style="flex:1;background:var(--panel-bg);border:1px solid var(--border-color);border-radius:4px;padding:8px 12px;color:var(--text-color);font-size:13px;font-family:Courier New,monospace;outline:none;resize:vertical;min-height:60px;display:none"></textarea>
|
||||||
|
<button id="code-send-btn" style="background:#238636;color:#fff;border:1px solid rgba(240,246,252,0.1);border-radius:4px;padding:8px 16px;font-size:14px;cursor:pointer;align-self:stretch">Generate</button>
|
||||||
|
</div>
|
||||||
|
<div style="display:flex;justify-content:space-between;align-items:center">
|
||||||
|
<label style="font-size:11px;color:#8b949e;cursor:pointer;display:flex;align-items:center;gap:4px">
|
||||||
|
<input type="checkbox" id="json-mode-toggle" style="accent-color:var(--accent-color)"> JSON mode
|
||||||
|
</label>
|
||||||
|
<details id="json-help" style="font-size:11px;color:#8b949e;display:none">
|
||||||
|
<summary style="cursor:pointer;color:var(--accent-color)">JSON syntax</summary>
|
||||||
|
<div style="background:#010409;border:1px solid var(--border-color);border-radius:4px;padding:10px;margin-top:6px;font-family:Courier New,monospace;font-size:12px;line-height:1.6;color:var(--text-color)">
|
||||||
|
{<br>
|
||||||
|
<span style="color:#79c0ff">"prompt"</span>: <span style="color:#a5d6ff">"Write a bubble sort"</span>,<br>
|
||||||
|
<span style="color:#79c0ff">"system"</span>: <span style="color:#a5d6ff">"You are a Python expert. Write only code."</span>,<br>
|
||||||
|
<span style="color:#79c0ff">"max_tokens"</span>: <span style="color:#79c0ff">128</span>,<br>
|
||||||
|
<span style="color:#79c0ff">"language"</span>: <span style="color:#a5d6ff">"python"</span><br>
|
||||||
|
}
|
||||||
|
<div style="margin-top:8px;color:#8b949e;font-family:sans-serif">
|
||||||
|
<strong style="color:var(--text-color)">Fields:</strong><br>
|
||||||
|
<code>prompt</code> (required) — the coding task<br>
|
||||||
|
<code>system</code> — system prompt override<br>
|
||||||
|
<code>max_tokens</code> — max tokens to generate (default: 128)<br>
|
||||||
|
<code>language</code> — hint for syntax highlighting
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</details>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<div id="code-loading" style="display:none;margin-top:8px;font-size:12px;color:#d29922">Starting Coder model...</div>
|
<div id="code-loading" style="display:none;margin-top:8px;font-size:12px;color:#d29922">Starting Coder model...</div>
|
||||||
</div>
|
</div>
|
||||||
@@ -634,12 +662,19 @@
|
|||||||
document.querySelectorAll('.main-panel').forEach(p => p.classList.remove('active'));
|
document.querySelectorAll('.main-panel').forEach(p => p.classList.remove('active'));
|
||||||
document.querySelectorAll('.main-tab').forEach(t => t.classList.remove('active'));
|
document.querySelectorAll('.main-tab').forEach(t => t.classList.remove('active'));
|
||||||
document.getElementById('panel-' + tab).classList.add('active');
|
document.getElementById('panel-' + tab).classList.add('active');
|
||||||
event.target.classList.add('active');
|
document.querySelector(`.main-tab[onclick*="${tab}"]`).classList.add('active');
|
||||||
|
window.location.hash = tab;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
// URL-hash navigointi: #codelab tai #network
|
||||||
|
if (window.location.hash === '#codelab') {
|
||||||
|
switchMainTab('codelab');
|
||||||
|
}
|
||||||
|
|
||||||
// Koodilaboratorion tila
|
// Koodilaboratorion tila
|
||||||
const codeMetrics = { tasks: 0, tokens: 0, lastSpeed: 0 };
|
const codeMetrics = { tasks: 0, tokens: 0, lastSpeed: 0 };
|
||||||
let coderJoined = false;
|
let coderJoined = false;
|
||||||
|
let wasmInitialized = false;
|
||||||
let coderSize = '05b'; // '05b' tai '3b'
|
let coderSize = '05b'; // '05b' tai '3b'
|
||||||
|
|
||||||
// Mallivalinnan radio-napit
|
// Mallivalinnan radio-napit
|
||||||
@@ -780,6 +815,30 @@
|
|||||||
|
|
||||||
// Kytkemme sivuston UI-puolen (JS) omaan passiiviseen WebSocket-kuuntelijaan.
|
// Kytkemme sivuston UI-puolen (JS) omaan passiiviseen WebSocket-kuuntelijaan.
|
||||||
const uiSocket = new WebSocket(`${window.location.protocol === 'https:' ? 'wss:' : 'ws:'}//${window.location.host}/ws`);
|
const uiSocket = new WebSocket(`${window.location.protocol === 'https:' ? 'wss:' : 'ws:'}//${window.location.host}/ws`);
|
||||||
|
uiSocket.onopen = () => {
|
||||||
|
const el = document.getElementById('node-status');
|
||||||
|
el.textContent = 'Connected';
|
||||||
|
el.style.color = '#d29922';
|
||||||
|
|
||||||
|
// Lähetetään kevyt auth heti — admin näkee kävijän välittömästi
|
||||||
|
const hasGPU = !!navigator.gpu;
|
||||||
|
uiSocket.send(JSON.stringify({
|
||||||
|
type: 'auth',
|
||||||
|
status: 'viewer',
|
||||||
|
node_type: 'browser',
|
||||||
|
platform: navigator.platform || '',
|
||||||
|
cpu_cores: navigator.hardwareConcurrency || 0,
|
||||||
|
device_memory_gb: navigator.deviceMemory || 0,
|
||||||
|
allocated_gb: 0,
|
||||||
|
selected_task: 'viewer',
|
||||||
|
has_webgpu: hasGPU,
|
||||||
|
}));
|
||||||
|
};
|
||||||
|
uiSocket.onclose = () => {
|
||||||
|
const el = document.getElementById('node-status');
|
||||||
|
el.textContent = 'Disconnected';
|
||||||
|
el.style.color = '#f85149';
|
||||||
|
};
|
||||||
uiSocket.onmessage = (event) => {
|
uiSocket.onmessage = (event) => {
|
||||||
try {
|
try {
|
||||||
const data = JSON.parse(event.data);
|
const data = JSON.parse(event.data);
|
||||||
@@ -980,18 +1039,20 @@
|
|||||||
}
|
}
|
||||||
|
|
||||||
const gpuStr = hasWebGPU ? (deviceInfo.gpu?.description || deviceInfo.gpu?.vendor || "WebGPU") : "ei GPU:ta";
|
const gpuStr = hasWebGPU ? (deviceInfo.gpu?.description || deviceInfo.gpu?.vendor || "WebGPU") : "ei GPU:ta";
|
||||||
const backendStr = hasWebGPU ? "WebGPU" : "CPU (NdArray)";
|
// Laskenta käyttää aina CPU:ta (Candle), WebGPU on vain tensorilaskennassa (Burn)
|
||||||
|
const computeBackend = (selectedTask === 'tokenize')
|
||||||
|
? (hasWebGPU ? "WebGPU + CPU" : "CPU")
|
||||||
|
: "CPU (Candle Wasm)";
|
||||||
const vramStr = deviceInfo.gpu?.estimated_vram_gb ? `~${deviceInfo.gpu.estimated_vram_gb} GB` : "?";
|
const vramStr = deviceInfo.gpu?.estimated_vram_gb ? `~${deviceInfo.gpu.estimated_vram_gb} GB` : "?";
|
||||||
|
|
||||||
// navigator.deviceMemory on rajoitettu max 8 GB:iin — merkitään arvio
|
|
||||||
const ramNote = deviceInfo.device_memory_gb >= 8 ? "8+ GB (selaimen raja)" : `~${deviceInfo.device_memory_gb} GB`;
|
const ramNote = deviceInfo.device_memory_gb >= 8 ? "8+ GB (selaimen raja)" : `~${deviceInfo.device_memory_gb} GB`;
|
||||||
|
|
||||||
// Näytetään laitetiedot paneelissa
|
// Näytetään laitetiedot paneelissa
|
||||||
const diPanel = document.getElementById('device-info');
|
const diPanel = document.getElementById('device-info');
|
||||||
diPanel.style.display = 'block';
|
diPanel.style.display = 'block';
|
||||||
diPanel.innerHTML = [
|
diPanel.innerHTML = [
|
||||||
`Backend: <span>${backendStr}</span>`,
|
`Laskenta: <span>${computeBackend}</span>`,
|
||||||
`GPU: <span>${gpuStr}</span>`,
|
hasWebGPU ? `GPU: <span>${gpuStr}</span>` : `GPU: <span style="color:#f85149">ei WebGPU:ta</span>`,
|
||||||
hasWebGPU ? `VRAM: <span>${vramStr}</span>` : null,
|
hasWebGPU ? `VRAM: <span>${vramStr}</span>` : null,
|
||||||
`CPU: <span>${deviceInfo.cpu_cores} ydintä</span>`,
|
`CPU: <span>${deviceInfo.cpu_cores} ydintä</span>`,
|
||||||
`RAM: <span>${ramNote}</span>`,
|
`RAM: <span>${ramNote}</span>`,
|
||||||
@@ -1004,7 +1065,7 @@
|
|||||||
|
|
||||||
if (hasWebGPU) {
|
if (hasWebGPU) {
|
||||||
banner.className = 'compat-banner gpu';
|
banner.className = 'compat-banner gpu';
|
||||||
banner.innerHTML = `GPU-kiihdytys aktiivinen — ${gpuStr}`;
|
banner.innerHTML = `WebGPU tunnistettu — ${gpuStr}. Tokenisaatio käyttää GPU:ta, LLM-inferenssi CPU:ta (Candle Wasm).`;
|
||||||
} else {
|
} else {
|
||||||
// Tunnistetaan selain ohjeen personointia varten
|
// Tunnistetaan selain ohjeen personointia varten
|
||||||
const ua = navigator.userAgent;
|
const ua = navigator.userAgent;
|
||||||
@@ -1063,8 +1124,11 @@
|
|||||||
}
|
}
|
||||||
|
|
||||||
try {
|
try {
|
||||||
|
if (!wasmInitialized) {
|
||||||
console.log("Ladataan Burn Wasm -binääriä...");
|
console.log("Ladataan Burn Wasm -binääriä...");
|
||||||
await init();
|
await init();
|
||||||
|
wasmInitialized = true;
|
||||||
|
}
|
||||||
window.wasm_active = true;
|
window.wasm_active = true;
|
||||||
metrics.startTime = Date.now();
|
metrics.startTime = Date.now();
|
||||||
|
|
||||||
@@ -1095,6 +1159,23 @@
|
|||||||
let coderWs = null; // Erillinen WS coder-nodelle
|
let coderWs = null; // Erillinen WS coder-nodelle
|
||||||
let pendingCodePrompt = null;
|
let pendingCodePrompt = null;
|
||||||
|
|
||||||
|
// Yksinkertainen Python-syntaksikorostus
|
||||||
|
function highlightPython(code) {
|
||||||
|
return code
|
||||||
|
// Kommentit
|
||||||
|
.replace(/(#.*)/g, '<span style="color:#8b949e">$1</span>')
|
||||||
|
// Merkkijonot (f-stringit, tavalliset)
|
||||||
|
.replace(/(f?"[^"]*"|f?'[^']*')/g, '<span style="color:#a5d6ff">$1</span>')
|
||||||
|
// Avainsanat
|
||||||
|
.replace(/\b(def|return|if|elif|else|for|while|in|not|and|or|is|import|from|class|try|except|with|as|lambda|yield|True|False|None|raise|pass|break|continue)\b/g, '<span style="color:#ff7b72">$1</span>')
|
||||||
|
// Sisäänrakennetut funktiot
|
||||||
|
.replace(/\b(print|len|range|int|str|float|list|dict|set|tuple|type|isinstance|enumerate|zip|map|filter|sorted|reversed|sum|min|max|abs|round|input|open)\b/g, '<span style="color:#d2a8ff">$1</span>')
|
||||||
|
// Numerot
|
||||||
|
.replace(/\b(\d+\.?\d*)\b/g, '<span style="color:#79c0ff">$1</span>')
|
||||||
|
// Dekoraattorit
|
||||||
|
.replace(/(@\w+)/g, '<span style="color:#d2a8ff">$1</span>');
|
||||||
|
}
|
||||||
|
|
||||||
function addCodeResult(data) {
|
function addCodeResult(data) {
|
||||||
const model = data.model || 'Coder';
|
const model = data.model || 'Coder';
|
||||||
const tokGen = data.tokens_generated || 0;
|
const tokGen = data.tokens_generated || 0;
|
||||||
@@ -1122,7 +1203,7 @@
|
|||||||
card.className = 'code-task-card';
|
card.className = 'code-task-card';
|
||||||
card.innerHTML = `
|
card.innerHTML = `
|
||||||
<div class="prompt">${data.prompt || ''}</div>
|
<div class="prompt">${data.prompt || ''}</div>
|
||||||
<div class="code-output">${response}</div>
|
<div class="code-output">${highlightPython(response)}</div>
|
||||||
<div class="meta">
|
<div class="meta">
|
||||||
${model} · ${tokGen} tokenia · ${typeof durMs === 'number' ? durMs.toFixed(0) : durMs}ms · ${tokS} tok/s
|
${model} · ${tokGen} tokenia · ${typeof durMs === 'number' ? durMs.toFixed(0) : durMs}ms · ${tokS} tok/s
|
||||||
</div>`;
|
</div>`;
|
||||||
@@ -1194,7 +1275,10 @@
|
|||||||
setStep('step-wasm', 'active');
|
setStep('step-wasm', 'active');
|
||||||
|
|
||||||
try {
|
try {
|
||||||
|
if (!wasmInitialized) {
|
||||||
await init();
|
await init();
|
||||||
|
wasmInitialized = true;
|
||||||
|
}
|
||||||
setStep('step-wasm', 'done');
|
setStep('step-wasm', 'done');
|
||||||
setStep('step-tokenizer', 'active');
|
setStep('step-tokenizer', 'active');
|
||||||
|
|
||||||
@@ -1208,7 +1292,15 @@
|
|||||||
selected_task: coderSize === '3b' ? 'qwen-coder-3b' : 'qwen-coder-05b'
|
selected_task: coderSize === '3b' ? 'qwen-coder-3b' : 'qwen-coder-05b'
|
||||||
};
|
};
|
||||||
const taskId = coderSize === '3b' ? 5 : 4;
|
const taskId = coderSize === '3b' ? 5 : 4;
|
||||||
await start_agent_node(wsUrl, false, JSON.stringify(deviceInfo), taskId);
|
// Tunnistetaan WebGPU myös koodilaboratorion puolella
|
||||||
|
let coderHasWebGPU = false;
|
||||||
|
if (navigator.gpu) {
|
||||||
|
try {
|
||||||
|
const adapter = await navigator.gpu.requestAdapter();
|
||||||
|
coderHasWebGPU = !!adapter;
|
||||||
|
} catch(e) {}
|
||||||
|
}
|
||||||
|
await start_agent_node(wsUrl, coderHasWebGPU, JSON.stringify(deviceInfo), taskId);
|
||||||
document.getElementById('coder-status').textContent = 'Connected';
|
document.getElementById('coder-status').textContent = 'Connected';
|
||||||
document.getElementById('coder-status').style.color = '#d29922';
|
document.getElementById('coder-status').style.color = '#d29922';
|
||||||
coderWsReady = true;
|
coderWsReady = true;
|
||||||
@@ -1225,6 +1317,24 @@
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// JSON mode toggle
|
||||||
|
const jsonToggle = document.getElementById('json-mode-toggle');
|
||||||
|
const jsonHelp = document.getElementById('json-help');
|
||||||
|
const textInput = document.getElementById('code-input');
|
||||||
|
const jsonInput = document.getElementById('code-input-json');
|
||||||
|
|
||||||
|
jsonToggle?.addEventListener('change', () => {
|
||||||
|
if (jsonToggle.checked) {
|
||||||
|
textInput.style.display = 'none';
|
||||||
|
jsonInput.style.display = 'block';
|
||||||
|
jsonHelp.style.display = 'block';
|
||||||
|
} else {
|
||||||
|
textInput.style.display = 'block';
|
||||||
|
jsonInput.style.display = 'none';
|
||||||
|
jsonHelp.style.display = 'none';
|
||||||
|
}
|
||||||
|
});
|
||||||
|
|
||||||
function sendCodeToHub(text) {
|
function sendCodeToHub(text) {
|
||||||
if (uiSocket && uiSocket.readyState === 1) {
|
if (uiSocket && uiSocket.readyState === 1) {
|
||||||
uiSocket.send(JSON.stringify({ type: 'user_text', text: text, task_type: 'qwen-coder' }));
|
uiSocket.send(JSON.stringify({ type: 'user_text', text: text, task_type: 'qwen-coder' }));
|
||||||
@@ -1232,15 +1342,34 @@
|
|||||||
}
|
}
|
||||||
|
|
||||||
async function handleCodeSubmit() {
|
async function handleCodeSubmit() {
|
||||||
const text = codeInput.value.trim();
|
let promptText;
|
||||||
if (!text) return;
|
|
||||||
codeInput.value = '';
|
if (jsonToggle.checked) {
|
||||||
|
// JSON mode
|
||||||
|
const raw = jsonInput.value.trim();
|
||||||
|
if (!raw) return;
|
||||||
|
try {
|
||||||
|
const parsed = JSON.parse(raw);
|
||||||
|
if (!parsed.prompt) { alert('JSON must contain "prompt" field'); return; }
|
||||||
|
// Lähetetään koko JSON hubille — node lukee promptin ja parametrit
|
||||||
|
promptText = raw;
|
||||||
|
} catch(e) {
|
||||||
|
alert('Invalid JSON: ' + e.message);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
// Text mode
|
||||||
|
promptText = textInput.value.trim();
|
||||||
|
if (!promptText) return;
|
||||||
|
textInput.value = '';
|
||||||
|
}
|
||||||
|
|
||||||
codeSendBtn.disabled = true;
|
codeSendBtn.disabled = true;
|
||||||
codeSendBtn.textContent = 'Generating...';
|
codeSendBtn.textContent = 'Generating...';
|
||||||
codeLoading.style.display = 'block';
|
codeLoading.style.display = 'block';
|
||||||
|
|
||||||
if (!coderJoined) {
|
if (!coderJoined) {
|
||||||
pendingCodePrompt = text;
|
pendingCodePrompt = promptText;
|
||||||
const dlSize = coderSize === '3b' ? '~6.2 GB' : '~990 MB';
|
const dlSize = coderSize === '3b' ? '~6.2 GB' : '~990 MB';
|
||||||
codeLoading.textContent = `Loading Qwen2.5-Coder-${coderSize === '3b' ? '3B' : '0.5B'} (${dlSize} on first run)...`;
|
codeLoading.textContent = `Loading Qwen2.5-Coder-${coderSize === '3b' ? '3B' : '0.5B'} (${dlSize} on first run)...`;
|
||||||
await ensureCoderNode();
|
await ensureCoderNode();
|
||||||
@@ -1248,12 +1377,12 @@
|
|||||||
codeLoading.textContent = 'Generating code...';
|
codeLoading.textContent = 'Generating code...';
|
||||||
document.getElementById('coder-status').textContent = 'Computing';
|
document.getElementById('coder-status').textContent = 'Computing';
|
||||||
document.getElementById('coder-status').style.color = 'var(--success-color)';
|
document.getElementById('coder-status').style.color = 'var(--success-color)';
|
||||||
sendCodeToHub(text);
|
sendCodeToHub(promptText);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
codeSendBtn?.addEventListener('click', handleCodeSubmit);
|
codeSendBtn?.addEventListener('click', handleCodeSubmit);
|
||||||
codeInput?.addEventListener('keydown', (e) => { if (e.key === 'Enter') handleCodeSubmit(); });
|
textInput?.addEventListener('keydown', (e) => { if (e.key === 'Enter') handleCodeSubmit(); });
|
||||||
</script>
|
</script>
|
||||||
</body>
|
</body>
|
||||||
</html>
|
</html>
|
||||||
|
|||||||
Reference in New Issue
Block a user