initial commit: agentic office

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CodeBench: --boss orkestroija — iso malli analysoi, pieni korjaa
2026-04-15 13:14:39 +03:00 · 2026-04-15 00:51:31 +03:00 · 2026-04-15 00:37:34 +03:00 · 2026-04-15 00:28:57 +03:00 · 2026-04-15 00:21:36 +03:00 · 2026-04-15 00:12:57 +03:00
356 changed files with 27321 additions and 23415 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -37,3 +37,12 @@ Cargo.lock

 # Ajonaikaiset tietokannat
 *.db
+
+# Lokitiedostot
+*.log
+
+# Wanha versio
+temp/
+
+# Muut
+zipit/**
--- a/TEMPLATING.md
+++ b/TEMPLATING.md
@@ -0,0 +1,157 @@
+# Templating — rakennuspalaset koodigeneroinnissa
+
+## Perusperiaate
+
+Kielimalli päättää **mitä** rakennetaan (entiteetit, kentät, tyypit, yhteydet).
+Template-funktiot päättävät **miten** se rakennetaan (importit, engine setup, testikonfiguraatio).
+
+```
+Projektikuvaus → LLM → JSON-speksi → Templateit → Koodi → Validointi
+```
+
+LLM:n kontribuutio on yksi JSON-rakenne. Kaikki muu on determinististä —
+sama speksi tuottaa aina saman koodin.
+
+## Miksi tämä toimii
+
+Pienen kielimallin (0.5B–7B) vahvuudet ja heikkoudet ovat epäsymmetrisiä:
+
+| Tehtävä | LLM:n kyky | Ratkaisu |
+|---------|-----------|----------|
+| Tunnista entiteetit kuvauksesta | Hyvä | LLM tekee |
+| Valitse kenttätyypit | Hyvä | LLM tekee |
+| Muista importit oikein | Huono | Template tekee |
+| SQLite connect_args | Huono | Template tekee |
+| Testikonfiguraatio | Huono | Template tekee |
+| Dockerfile-rakenne | Huono | Template tekee |
+
+Annetaan mallin tehdä se missä se on hyvä. Hoidetaan loput mekaanisesti.
+
+## JSON-speksi
+
+Kielimallin ainoa tuotos on JSON joka kuvaa projektin rakenteen:
+
+```json
+{
+  "project_name": "library-app",
+  "entities": [
+    {
+      "name": "Author",
+      "table_name": "authors",
+      "fields": [
+        {"name": "name", "sa_type": "String(255)", "py_type": "str", "nullable": false, "default": null}
+      ]
+    },
+    {
+      "name": "Book",
+      "table_name": "books",
+      "fields": [
+        {"name": "title", "sa_type": "String(255)", "py_type": "str", "nullable": false, "default": null},
+        {"name": "author_id", "sa_type": "Integer", "py_type": "int", "nullable": false, "default": null}
+      ]
+    }
+  ],
+  "relationships": [
+    {"from": "Book", "field": "author_id", "to": "Author", "type": "many-to-one"}
+  ],
+  "extra_imports": []
+}
+```
+
+Speksin laatu ratkaisee kaiken. Hyvä speksi → hyvä projekti. Huono speksi →
+teknisesti toimiva mutta sisällöllisesti väärä projekti.
+
+## Architect-promptin rooli
+
+Architect-agentti (JSON-speksin generoija) on kriittisin kohta koko pipelinessa.
+Sitä ohjataan neljällä keinolla:
+
+1. **Chain-of-thought** — malli miettii ensin entiteetit, sitten kentät,
+   sitten yhteydet, vasta lopuksi JSON
+2. **Domain-esimerkit** — Todo, verkkokauppa, blogi — malli näkee miltä
+   hyvä speksi näyttää eri domaineissa
+3. **Anti-patternit** — turhat ID-kentät, Enum-tyypit, suomenkieliset nimet
+4. **Yhteyssäännöt** — jokainen `_id`-kenttä tarvitsee relationship-merkinnän
+
+Isompi malli tässä yhdessä kohdassa parantaisi kaikkien projektien laatua.
+
+## Templateit
+
+Jokainen template on funktio joka ottaa speksin ja palauttaa koodia:
+
+```
+tmplModels(spec)     → models.py      (SQLAlchemy, ForeignKey, relationship)
+tmplSchemas(spec)    → schemas.py     (Pydantic Create/Response/Detail)
+tmplMain(spec)       → main.py        (FastAPI CRUD + nested endpoints + FK-validointi)
+tmplTests(spec)      → test_main.py   (pytest + TestClient + helper-funktiot)
+tmplPyproject(spec)  → pyproject.toml (PEP 621)
+tmplDockerfile()     → Dockerfile     (uv + non-root user)
+```
+
+Templateit generoivat automaattisesti:
+- ForeignKey-constraintit ja relationship()-määrittelyt
+- Nested endpointit (`GET /authors/{id}/books/`)
+- FK-validointi (404 jos parent-entiteettiä ei ole)
+- Detail-schemat (Book + author-data mukana)
+- Test-helperit jotka luovat parent-entiteetit ensin
+- Bad FK -testit (varmistaa että orpo-validointi toimii)
+
+## Validointi
+
+Generoitu koodi validoidaan mekaanisesti ennen käyttöä:
+
+- Syntaksitarkistus (AST parse)
+- Projektin sisäiset importit (löytyykö nimi lähdetiedostosta)
+- SQLite connect_args
+- Relatiiviset importit (kielletty)
+- Testien rakenne (ei saa kopioida appia)
+- pyproject.toml (ei poetryä)
+- Dockerfile (ei poetryä, uv cache -oikeudet)
+
+Docker-testi ajaa koko projektin: build → pytest → API smoke test.
+
+## Rajoitukset
+
+Templateit kattavat rakenteellisesti tunnetut projektit:
+
+| Stack | Kattavuus |
+|-------|-----------|
+| FastAPI + SQLAlchemy CRUD | Toimii hyvin |
+| Streamlit + DuckDB dashboard | Toimii hyvin |
+| Muu | Ei templatea → ei toimi |
+
+**Ei kata:**
+- Custom business-logiikka (algoritmit, laskenta, ML)
+- Epätyypilliset arkkitehtuurit (WebSocket, graafit, tapahtumapohjaiset)
+- Frontend-sovellukset (React, Vue)
+- Mikä tahansa mitä template ei tunne
+
+Arvio: templateit kattavat ~20% kaikista mahdollisista projekteista, mutta juuri
+sen 20% mitä opiskelu- ja prototyyppiympäristöissä tarvitaan useimmin.
+
+## Laajentaminen
+
+Uuden stackin lisääminen vaatii:
+
+1. Uudet template-funktiot (käsityö, ~200–400 riviä per stack)
+2. JSON-speksin laajennos (uudet kentät jos tarvitaan)
+3. Validointisäännöt uudelle stackille
+4. Docker-testikonfiguraatio
+
+Jokainen template on staattinen — se ei opi eikä sopeudu. Kattavuus kasvaa
+vain kirjoittamalla lisää templateja.
+
+## Hybridi: seuraava askel
+
+Paras lopputulos syntyisi yhdistelmällä:
+
+```
+Speksi → Template (runko) → LLM (business-logiikka) → Validointi
+```
+
+Template tuottaa toimivan CRUD-pohjan. LLM lisää domain-kohtaisen logiikan
+pienissä palasissa (yksi funktio kerrallaan). Mekaaninen validointi
+tarkistaa jokaisen lisäyksen.
+
+Tämä palauttaa LLM:n epäluotettavuuden takaisin peliin, mutta rajattuna:
+virheet ovat paikallisia (yksi funktio) eivätkä rakenteellisia (koko projekti).
--- a/kipina-codebench/Dockerfile.cargo-test
+++ b/kipina-codebench/Dockerfile.cargo-test
@@ -0,0 +1,4 @@
+FROM rust:latest
+RUN apt-get update && apt-get install -y pkg-config libssl-dev cmake && rm -rf /var/lib/apt/lists/*
+WORKDIR /work
+ENTRYPOINT ["sh", "-c", "cp -r /src/* . && cargo test 2>&1"]
--- a/kipina-codebench/Dockerfile.go-test
+++ b/kipina-codebench/Dockerfile.go-test
@@ -0,0 +1,4 @@
+FROM golang:1.23-alpine
+RUN apk add --no-cache gcc musl-dev
+WORKDIR /work
+ENTRYPOINT ["sh", "-c", "cp -r /src/* . && go mod tidy 2>&1 && go test -v -count=1 ./... 2>&1"]
--- a/kipina-codebench/Dockerfile.pytest
+++ b/kipina-codebench/Dockerfile.pytest
@@ -0,0 +1,5 @@
+FROM python:3.14-slim
+COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
+WORKDIR /work
+ENV PYTHONPATH=/work
+ENTRYPOINT ["sh", "-c", "uv init --no-readme --python '>=3.14' 2>/dev/null && rm -f hello.py main.py && uv add fastapi 'uvicorn[standard]' sqlalchemy pytest httpx 2>/dev/null && cp /src/*.py . && rm -f app.db test.db && uv run pytest test_main.py -v --tb=short 2>&1"]
--- a/kipina-codebench/README.md
+++ b/kipina-codebench/README.md
@@ -0,0 +1,95 @@
+# Kipinä CodeBench
+
+LLM-koodingenerointibenchmark. Testaa Ollama-mallien kykyä generoida toimivia FastAPI+SQLAlchemy-projekteja ja ajaa testit Docker-kontissa.
+
+## Pikastart
+
+```bash
+# 1. Rakenna Docker-testikontti
+docker build -t kipina-pytest -f Dockerfile.pytest .
+
+# 2. Aja benchmark
+node benchmark.mjs --ollama http://localhost:11434 --scenarios all
+
+# 3. Avaa raportti
+open /tmp/kipina-benchmark/report.html
+```
+
+## Pipeline
+
+```
+1. LLM → vaatimusmäärittely (prompts/client.md)
+2. LLM → JSON-speksi (prompts/spec.md)
+3. LLM → 4 Python-tiedostoa (prompts/code.md + golden-examples/)
+4. Staattinen validointi + LLM-korjaus (prompts/fix.md)
+5. Docker: uv init + uv add + pytest
+```
+
+## CLI-argumentit
+
+| Argumentti | Oletus | Kuvaus |
+|-----------|--------|--------|
+| `--ollama` | `http://localhost:11434` | Ollama-palvelimen URL |
+| `--hub` | - | Hub-reitti (vaihtoehto Ollamalle) |
+| `--models` | kaikki | Pilkuilla erotettu mallilista |
+| `--scenarios` | `default` (todo) | `all` = todo, users, blog |
+| `--output` | `/tmp/kipina-benchmark` | Tuloshakemisto |
+
+## Hakemistorakenne
+
+```
+kipina-codebench/
+├── benchmark.mjs            ← runner
+├── Dockerfile.pytest        ← Python 3.14 + uv testikontti
+├── report-template.html     ← HTML-raporttipohja
+├── package.json
+├── prompts/                 ← muokattavat promptit
+│   ├── client.md            ← vaatimusmäärittely
+│   ├── spec.md              ← JSON-speksi
+│   ├── code.md              ← koodigenerointi
+│   └── fix.md               ← korjaus
+├── golden-examples/         ← referenssitoteutukset
+│   ├── todo/                ← taso 1: perus-CRUD (6 testiä)
+│   ├── blog/                ← taso 2: relaatiot (13 testiä)
+│   └── DOCUMENTATION.md     ← zensical-dokumentointiohjeet
+└── results/                 ← tallennetut tulokset
+```
+
+## Promptien muokkaus
+
+Promptit ovat `prompts/`-kansiossa Markdown-tiedostoina. Muokkaa suoraan — benchmark lataa ne käynnistyksessä.
+
+Esimerkki: lisää sääntö `prompts/code.md`:hen:
+```
+- Tests: PUT/update test data MUST include ALL required fields
+```
+
+## Kultaiset esimerkit
+
+`golden-examples/todo/` syötetään LLM:lle referenssinä. Malli näkee tarkalleen millaista koodia odotetaan:
+- SQLAlchemy 2.0 (DeclarativeBase, Mapped, mapped_column)
+- Pydantic v2 (ConfigDict)
+- Python 3.14 syntaksi (str | None)
+- Uniikki testidata per testi
+
+Lisää uusia esimerkkejä luomalla hakemisto (esim. `golden-examples/shop/`).
+
+## Pisteytys
+
+| Komponentti | Pisteet | Peruste |
+|---|---|---|
+| Speksi OK | 10p | JSON-speksi onnistui |
+| Koodi generoitu | 10p | Kaikki 4 tiedostoa syntyneet |
+| Testit | 0–60p | passed/total × 60 |
+| Korjaukset | 0–20p | 0 kierrosta = 20p, 1 = 10p, 2+ = 0p |
+
+Tähdet: ★★★★★ (90+), ★★★★☆ (70+), ★★★☆☆ (50+), ★★☆☆☆ (25+), ★☆☆☆☆ (1+)
+
+## Käyttö git-submodulena
+
+```bash
+git submodule add <repo-url> tools/codebench
+cd tools/codebench
+docker build -t kipina-pytest -f Dockerfile.pytest .
+node benchmark.mjs --ollama http://localhost:11434 --scenarios all
+```
--- a/kipina-codebench/benchmark.mjs
+++ b/kipina-codebench/benchmark.mjs
--- a/kipina-codebench/golden-examples/DOCUMENTATION.md
+++ b/kipina-codebench/golden-examples/DOCUMENTATION.md
@@ -0,0 +1,84 @@
+# Dokumentointiohjeet — Zensical
+
+Hyvä dokumentointi kertoo **mitä asia ON**, ei mitä se tekee. Se on kuin zen-koan: lyhyt, tarkka, riittävä.
+
+## Periaatteet
+
+1. **Yksi rivi riittää.** Jos tarvitset kappaleen, koodi on liian monimutkainen.
+2. **Kerro mitä, älä miten.** `"""Tietokantamallit — SQLAlchemy 2.0, SQLite."""` ei `"""This module creates database models using SQLAlchemy..."""`
+3. **Älä toista koodia.** Jos funktio on `create_todo`, docstring ei ole "Creates a todo".
+4. **Suomi tai englanti, ei molempia.** Valitse yksi kieli per projekti.
+5. **Ei täytesanoja.** "This module provides functionality for" → poista.
+
+## Mitä dokumentoidaan
+
+| Kohde | Dokumentointi | Esimerkki |
+|-------|--------------|-----------|
+| **Moduuli** (.py) | Aina. Yksi rivi: mitä tiedosto sisältää. | `"""Pydantic v2 -skeemat — Create ja Response."""` |
+| **Luokka** | Aina. Mitä entiteetti edustaa. | `"""Tehtävä — otsikko, deadline, prioriteetti."""` |
+| **Funktio** | Vain jos nimi ei kerro kaikkea. | `get_db` → `"""Tietokantasessio per pyyntö."""` |
+| **CRUD-endpoint** | Ei. Nimi + HTTP-metodi riittää. | `create_todo`, `list_todos` — itsedokumentoivia |
+| **Testi** | Ei. Testin nimi on dokumentaatio. | `test_get_todo_not_found` — selvä |
+| **Konfiguraatio** | Kommentti vain jos arvo yllättää. | `check_same_thread: False  # SQLite + FastAPI` |
+
+## Mitä EI dokumentoida
+
+- Importteja
+- Ilmeisiä parametreja (`item_id: int`)
+- Tyyppivihjeitä jotka kertovat saman asian
+- Geneerisiä "boilerplate"-docstringejä
+
+## Esimerkkejä
+
+### Hyvä (zensical)
+
+```python
+"""Tietokantamallit — SQLAlchemy 2.0, Mapped-tyypitys, SQLite."""
+
+class Todo(Base):
+    """Tehtävä — otsikko, kuvaus, deadline, prioriteetti ja status."""
+    ...
+
+def get_db():
+    """Tietokantasessio per pyyntö."""
+    ...
+```
+
+### Huono (verbose)
+
+```python
+"""
+This module defines the database models for the Todo application.
+It uses SQLAlchemy ORM to create the database tables and provides
+the session factory for database connections.
+"""
+
+class Todo(Base):
+    """
+    Represents a todo item in the database.
+
+    Attributes:
+        id: The unique identifier for the todo item.
+        title: The title of the todo item.
+        ...
+    """
+    ...
+```
+
+### Huono (tyhjä)
+
+```python
+# Ei docstringejä ollenkaan — lukija ei tiedä mikä tiedoston rooli on
+class Todo(Base):
+    __tablename__ = "todos"
+    ...
+```
+
+## Tarkistuslista
+
+Generoitu koodi on hyvin dokumentoitu kun:
+- [ ] Jokainen .py-tiedosto alkaa yksirivisellä docstringillä
+- [ ] Jokainen luokka kertoo mitä entiteetti edustaa
+- [ ] Docstringit ovat saman kielen kuin muu koodi
+- [ ] CRUD-endpointeilla ei ole turhia docstringejä
+- [ ] Kommentteja on vain siellä missä koodi yllättää
--- a/kipina-codebench/golden-examples/README.md
+++ b/kipina-codebench/golden-examples/README.md
@@ -0,0 +1,123 @@
+# Golden Examples — referenssitoteutukset
+
+Kultaiset esimerkit ovat **täydellisiä, testattuja** FastAPI-projekteja joita LLM käyttää mallina koodigeneroinnissa. Malli näkee esimerkin ja tuottaa vastaavan rakenteen uudelle projektille.
+
+## Uuden esimerkin luominen
+
+### 1. Luo hakemisto
+
+```bash
+mkdir golden-examples/shop
+```
+
+Nimeä hakemisto skenaarion mukaan (todo, blog, shop, booking...).
+
+### 2. Luo 4 tiedostoa
+
+| Tiedosto | Sisältö |
+|----------|---------|
+| `models.py` | SQLAlchemy 2.0 -mallit (DeclarativeBase, Mapped, mapped_column) |
+| `schemas.py` | Pydantic v2 -skeemat (ConfigDict, `str \| None` -syntaksi) |
+| `main.py` | FastAPI CRUD -endpointit (POST 201, GET, GET/:id 404, PUT, DELETE 204) |
+| `test_main.py` | Pytest + TestClient, erillinen test.db, uniikki data per testi |
+
+### 3. Noudata konventioita
+
+**Python-versio:** >=3.14
+
+**SQLAlchemy 2.0** (ei legacy):
+```python
+# Oikein
+class Base(DeclarativeBase):
+    pass
+
+class Todo(Base):
+    id: Mapped[int] = mapped_column(primary_key=True, index=True)
+    title: Mapped[str] = mapped_column(String(255))
+    status: Mapped[str] = mapped_column(String(20), default="pending")
+
+# Väärin
+Base = declarative_base()
+id = Column(Integer, primary_key=True)
+```
+
+**Pydantic v2** (ei v1):
+```python
+# Oikein
+class TodoResponse(TodoCreate):
+    id: int
+    model_config = ConfigDict(from_attributes=True)
+
+# Väärin
+class Config:
+    orm_mode = True
+```
+
+**Tyypitys:**
+```python
+# Oikein
+description: Mapped[str | None] = mapped_column(Text, default=None)
+
+# Väärin
+description: Mapped[Optional[str]]
+```
+
+**Dokumentointi (zensical):**
+```python
+"""Tietokantamallit — SQLAlchemy 2.0, Mapped-tyypitys, SQLite."""
+
+class Todo(Base):
+    """Tehtävä — otsikko, kuvaus, deadline, prioriteetti ja status."""
+```
+
+Yksi rivi riittää. Kerro mitä asia ON, älä mitä se tekee. Katso [DOCUMENTATION.md](DOCUMENTATION.md).
+
+**Testidata — uniikki ja kuvaava:**
+```python
+# Oikein
+def test_create_todo():
+    response = client.post("/todos/", json={"title": "Osta maitoa", "priority": 2})
+
+def test_update_todo():
+    created = client.post("/todos/", json={"title": "Vanha otsikko"}).json()
+
+# Väärin — geneerinen data
+def test_create_todo():
+    response = client.post("/todos/", json={"title": "test", "priority": 1})
+```
+
+### 4. Testaa Docker-kontissa
+
+```bash
+rm -rf /tmp/golden-test && mkdir /tmp/golden-test
+cp golden-examples/shop/*.py /tmp/golden-test/
+docker run --rm -v /tmp/golden-test:/src:ro kipina-pytest
+```
+
+**Kaikkien testien pitää mennä läpi.** Ei varoituksia, ei deprecation-viestejä.
+
+### 5. Vaikeustasot
+
+| Taso | Esimerkit | Haaste |
+|------|-----------|--------|
+| 1 — Perus-CRUD | `todo/`, `users/`, `notes/` | Yksi entiteetti |
+| 2 — Relaatiot | `blog/`, `library/`, `school/` | Foreign key, 2–3 entiteettiä |
+| 3 — Liiketoimintalogiikka | `shop/`, `booking/` | Custom endpointit, validointi |
+
+Aloita tasosta 1 ja etene. Tason 1 esimerkkien pitää olla yksinkertaisia — ne opettavat mallille perusrakenteen.
+
+## Miten esimerkit vaikuttavat
+
+Benchmark lataa `todo/`-esimerkin ja syöttää sen LLM:lle osana koodingenerointipromptia:
+
+```
+REFERENCE IMPLEMENTATION (todo project — follow this exact structure):
+
+=== models.py ===
+<todo/models.py sisältö>
+
+=== schemas.py ===
+...
+```
+
+Malli näkee tarkan esimerkin ja tuottaa vastaavan rakenteen uudelle projektille. Mitä parempi esimerkki, sitä parempi tulos.
--- a/kipina-codebench/golden-examples/blog/main.py
+++ b/kipina-codebench/golden-examples/blog/main.py
@@ -0,0 +1,110 @@
+"""FastAPI CRUD — kaksi endpoint-settiä, Author ja Post."""
+
+from fastapi import FastAPI, Depends, HTTPException
+from sqlalchemy.orm import Session
+
+from models import SessionLocal, Author, Post
+from schemas import AuthorCreate, AuthorResponse, PostCreate, PostResponse
+
+app = FastAPI()
+
+
+def get_db():
+    """Tietokantasessio per pyyntö."""
+    db = SessionLocal()
+    try:
+        yield db
+    finally:
+        db.close()
+
+
+# --- Author ---
+
+
+@app.post("/authors/", response_model=AuthorResponse, status_code=201)
+def create_author(item: AuthorCreate, db: Session = Depends(get_db)):
+    db_item = Author(**item.model_dump())
+    db.add(db_item)
+    db.commit()
+    db.refresh(db_item)
+    return db_item
+
+
+@app.get("/authors/", response_model=list[AuthorResponse])
+def list_authors(db: Session = Depends(get_db)):
+    return db.query(Author).all()
+
+
+@app.get("/authors/{item_id}", response_model=AuthorResponse)
+def get_author(item_id: int, db: Session = Depends(get_db)):
+    item = db.query(Author).filter(Author.id == item_id).first()
+    if not item:
+        raise HTTPException(status_code=404, detail="Author not found")
+    return item
+
+
+@app.put("/authors/{item_id}", response_model=AuthorResponse)
+def update_author(item_id: int, item: AuthorCreate, db: Session = Depends(get_db)):
+    db_item = db.query(Author).filter(Author.id == item_id).first()
+    if not db_item:
+        raise HTTPException(status_code=404, detail="Author not found")
+    for key, value in item.model_dump().items():
+        setattr(db_item, key, value)
+    db.commit()
+    db.refresh(db_item)
+    return db_item
+
+
+@app.delete("/authors/{item_id}", status_code=204)
+def delete_author(item_id: int, db: Session = Depends(get_db)):
+    db_item = db.query(Author).filter(Author.id == item_id).first()
+    if not db_item:
+        raise HTTPException(status_code=404, detail="Author not found")
+    db.delete(db_item)
+    db.commit()
+
+
+# --- Post ---
+
+
+@app.post("/posts/", response_model=PostResponse, status_code=201)
+def create_post(item: PostCreate, db: Session = Depends(get_db)):
+    db_item = Post(**item.model_dump())
+    db.add(db_item)
+    db.commit()
+    db.refresh(db_item)
+    return db_item
+
+
+@app.get("/posts/", response_model=list[PostResponse])
+def list_posts(db: Session = Depends(get_db)):
+    return db.query(Post).all()
+
+
+@app.get("/posts/{item_id}", response_model=PostResponse)
+def get_post(item_id: int, db: Session = Depends(get_db)):
+    item = db.query(Post).filter(Post.id == item_id).first()
+    if not item:
+        raise HTTPException(status_code=404, detail="Post not found")
+    return item
+
+
+@app.put("/posts/{item_id}", response_model=PostResponse)
+def update_post(item_id: int, item: PostCreate, db: Session = Depends(get_db)):
+    db_item = db.query(Post).filter(Post.id == item_id).first()
+    if not db_item:
+        raise HTTPException(status_code=404, detail="Post not found")
+    for key, value in item.model_dump().items():
+        setattr(db_item, key, value)
+    db.commit()
+    db.refresh(db_item)
+    return db_item
+
+
+@app.delete("/posts/{item_id}", status_code=204)
+def delete_post(item_id: int, db: Session = Depends(get_db)):
+    db_item = db.query(Post).filter(Post.id == item_id).first()
+    if not db_item:
+        raise HTTPException(status_code=404, detail="Post not found")
+    db.delete(db_item)
+    db.commit()
--- a/kipina-codebench/golden-examples/blog/models.py
+++ b/kipina-codebench/golden-examples/blog/models.py
@@ -0,0 +1,45 @@
+"""Tietokantamallit — SQLAlchemy 2.0, Mapped-tyypitys, ForeignKey-relaatiot."""
+
+from datetime import datetime
+
+from sqlalchemy import String, Text, DateTime, ForeignKey, create_engine
+from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship, sessionmaker
+
+DATABASE_URL = "sqlite:///./app.db"
+engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
+SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
+
+
+class Base(DeclarativeBase):
+    pass
+
+
+class Author(Base):
+    """Kirjoittaja — nimi, sähköposti ja bio."""
+
+    __tablename__ = "authors"
+
+    id: Mapped[int] = mapped_column(primary_key=True, index=True)
+    name: Mapped[str] = mapped_column(String(255))
+    email: Mapped[str] = mapped_column(String(255), unique=True)
+    bio: Mapped[str | None] = mapped_column(Text, default=None)
+
+    posts: Mapped[list["Post"]] = relationship(back_populates="author")
+
+
+class Post(Base):
+    """Blogipostaus — otsikko, sisältö, kirjoittaja, julkaisuaika ja tila."""
+
+    __tablename__ = "posts"
+
+    id: Mapped[int] = mapped_column(primary_key=True, index=True)
+    title: Mapped[str] = mapped_column(String(255))
+    content: Mapped[str] = mapped_column(Text)
+    author_id: Mapped[int] = mapped_column(ForeignKey("authors.id"))
+    published_at: Mapped[datetime | None] = mapped_column(DateTime, default=None)
+    status: Mapped[str] = mapped_column(String(20), default="draft")
+
+    author: Mapped["Author"] = relationship(back_populates="posts")
+
+
+Base.metadata.create_all(bind=engine)
--- a/kipina-codebench/golden-examples/blog/schemas.py
+++ b/kipina-codebench/golden-examples/blog/schemas.py
@@ -0,0 +1,37 @@
+"""Pydantic v2 -skeemat — Create sisääntulolle, Response vastaukselle."""
+
+from datetime import datetime
+
+from pydantic import BaseModel, ConfigDict
+
+
+class AuthorCreate(BaseModel):
+    """Uuden kirjoittajan luonti. Pakolliset: name, email."""
+
+    name: str
+    email: str
+    bio: str | None = None
+
+
+class AuthorResponse(AuthorCreate):
+    """Palautettava kirjoittaja — sisältää id:n."""
+
+    id: int
+    model_config = ConfigDict(from_attributes=True)
+
+
+class PostCreate(BaseModel):
+    """Uuden postauksen luonti. Pakolliset: title, content, author_id."""
+
+    title: str
+    content: str
+    author_id: int
+    published_at: datetime | None = None
+    status: str = "draft"
+
+
+class PostResponse(PostCreate):
+    """Palautettava postaus — sisältää id:n."""
+
+    id: int
+    model_config = ConfigDict(from_attributes=True)
--- a/kipina-codebench/golden-examples/blog/test_main.py
+++ b/kipina-codebench/golden-examples/blog/test_main.py
@@ -0,0 +1,164 @@
+"""Pytest — TestClient, erillinen test.db, uniikki data per testi."""
+
+from fastapi.testclient import TestClient
+from sqlalchemy import create_engine
+from sqlalchemy.orm import sessionmaker
+
+from main import app, get_db
+from models import Base
+
+test_engine = create_engine(
+    "sqlite:///./test.db", connect_args={"check_same_thread": False}
+)
+TestSession = sessionmaker(autocommit=False, autoflush=False, bind=test_engine)
+Base.metadata.create_all(bind=test_engine)
+
+
+def override_get_db():
+    db = TestSession()
+    try:
+        yield db
+    finally:
+        db.close()
+
+
+app.dependency_overrides[get_db] = override_get_db
+client = TestClient(app)
+
+
+def _create_author(name="Eino Leino", email=None):
+    """Apufunktio kirjoittajan luomiseen testeissä."""
+    if email is None:
+        email = f"{name.lower().replace(' ', '.')}@example.com"
+    return client.post(
+        "/authors/", json={"name": name, "email": email}
+    ).json()
+
+
+# --- Author-testit ---
+
+
+def test_create_author():
+    response = client.post(
+        "/authors/",
+        json={"name": "Aleksis Kivi", "email": "aleksis@example.com", "bio": "Suomen kansalliskirjailija"},
+    )
+    assert response.status_code == 201
+    assert response.json()["name"] == "Aleksis Kivi"
+    assert response.json()["bio"] == "Suomen kansalliskirjailija"
+    assert "id" in response.json()
+
+
+def test_list_authors():
+    _create_author("Minna Canth", "minna.canth@example.com")
+    response = client.get("/authors/")
+    assert response.status_code == 200
+    assert len(response.json()) >= 1
+
+
+def test_get_author_by_id():
+    created = _create_author("Väinö Linna", "vaino.linna@example.com")
+    response = client.get(f"/authors/{created['id']}")
+    assert response.status_code == 200
+    assert response.json()["id"] == created["id"]
+
+
+def test_get_author_not_found():
+    response = client.get("/authors/99999")
+    assert response.status_code == 404
+
+
+def test_update_author():
+    created = _create_author("Vanha Nimi", "vanha.nimi@example.com")
+    response = client.put(
+        f"/authors/{created['id']}",
+        json={"name": "Uusi Nimi", "email": "uusi.nimi@example.com"},
+    )
+    assert response.status_code == 200
+    assert response.json()["name"] == "Uusi Nimi"
+
+
+def test_delete_author():
+    created = _create_author("Poistettava Kirjailija", "poistettava@example.com")
+    response = client.delete(f"/authors/{created['id']}")
+    assert response.status_code == 204
+    response = client.get(f"/authors/{created['id']}")
+    assert response.status_code == 404
+
+
+# --- Post-testit ---
+
+
+def test_create_post():
+    author = _create_author("Tove Jansson", "tove.jansson@example.com")
+    response = client.post(
+        "/posts/",
+        json={"title": "Muumipeikko ja pyrstötähti", "content": "Eräänä aamuna...", "author_id": author["id"]},
+    )
+    assert response.status_code == 201
+    assert response.json()["title"] == "Muumipeikko ja pyrstötähti"
+    assert response.json()["author_id"] == author["id"]
+    assert response.json()["status"] == "draft"
+
+
+def test_list_posts():
+    author = _create_author("Juhani Aho", "juhani.aho@example.com")
+    client.post(
+        "/posts/",
+        json={"title": "Rautatie", "content": "Junasta kertova novelli.", "author_id": author["id"]},
+    )
+    response = client.get("/posts/")
+    assert response.status_code == 200
+    assert len(response.json()) >= 1
+
+
+def test_get_post_by_id():
+    author = _create_author("Elias Lönnrot", "elias.lonnrot@example.com")
+    created = client.post(
+        "/posts/",
+        json={"title": "Kalevala", "content": "Vaka vanha Väinämöinen.", "author_id": author["id"]},
+    ).json()
+    response = client.get(f"/posts/{created['id']}")
+    assert response.status_code == 200
+    assert response.json()["id"] == created["id"]
+
+
+def test_get_post_not_found():
+    response = client.get("/posts/99999")
+    assert response.status_code == 404
+
+
+def test_update_post():
+    author = _create_author("Joel Lehtonen", "joel.lehtonen@example.com")
+    created = client.post(
+        "/posts/",
+        json={"title": "Vanha otsikko", "content": "Alkuperäinen teksti.", "author_id": author["id"]},
+    ).json()
+    response = client.put(
+        f"/posts/{created['id']}",
+        json={"title": "Päivitetty otsikko", "content": "Muokattu teksti.", "author_id": author["id"], "status": "published"},
+    )
+    assert response.status_code == 200
+    assert response.json()["title"] == "Päivitetty otsikko"
+    assert response.json()["status"] == "published"
+
+
+def test_delete_post():
+    author = _create_author("Aino Kallas", "aino.kallas@example.com")
+    created = client.post(
+        "/posts/",
+        json={"title": "Poistettava postaus", "content": "Tämä poistetaan.", "author_id": author["id"]},
+    ).json()
+    response = client.delete(f"/posts/{created['id']}")
+    assert response.status_code == 204
+    response = client.get(f"/posts/{created['id']}")
+    assert response.status_code == 404
+
+
+def test_post_belongs_to_author():
+    author = _create_author("Sofi Oksanen", "sofi.oksanen@example.com")
+    post = client.post(
+        "/posts/",
+        json={"title": "Puhdistus", "content": "Romaani Virosta.", "author_id": author["id"]},
+    ).json()
+    assert post["author_id"] == author["id"]
--- a/kipina-codebench/golden-examples/combined-readme.md
+++ b/kipina-codebench/golden-examples/combined-readme.md
@@ -0,0 +1,204 @@
+# Example 1: Todo App (single entity)
+
+## models.py
+
+```python
+"""Tietokantamallit — SQLAlchemy 2.0, Mapped-tyypitys, SQLite."""
+from datetime import date
+from sqlalchemy import String, Text, Date, create_engine
+from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, sessionmaker
+
+DATABASE_URL = "sqlite:///./app.db"
+engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
+SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
+
+class Base(DeclarativeBase):
+    pass
+
+class Todo(Base):
+    __tablename__ = "todos"
+    id: Mapped[int] = mapped_column(primary_key=True, index=True)
+    title: Mapped[str] = mapped_column(String(255))
+    description: Mapped[str | None] = mapped_column(Text, default=None)
+    due_date: Mapped[date | None] = mapped_column(Date, default=None)
+    priority: Mapped[int] = mapped_column(default=1)
+    status: Mapped[str] = mapped_column(String(20), default="pending")
+
+Base.metadata.create_all(bind=engine)
+```
+
+## schemas.py
+
+```python
+from datetime import date
+from pydantic import BaseModel, ConfigDict
+
+class TodoCreate(BaseModel):
+    title: str
+    description: str | None = None
+    due_date: date | None = None
+    priority: int = 1
+    status: str = "pending"
+
+class TodoResponse(TodoCreate):
+    id: int
+    model_config = ConfigDict(from_attributes=True)
+```
+
+## test_main.py — exactly 6 tests per entity
+
+```python
+from fastapi.testclient import TestClient
+from sqlalchemy import create_engine
+from sqlalchemy.orm import sessionmaker
+from main import app, get_db
+from models import Base
+
+test_engine = create_engine("sqlite:///./test.db", connect_args={"check_same_thread": False})
+TestSession = sessionmaker(autocommit=False, autoflush=False, bind=test_engine)
+Base.metadata.create_all(bind=test_engine)
+
+def override_get_db():
+    db = TestSession()
+    try: yield db
+    finally: db.close()
+
+app.dependency_overrides[get_db] = override_get_db
+client = TestClient(app)
+
+def test_create_todo():
+    response = client.post("/todos/", json={"title": "Osta maitoa", "priority": 2})
+    assert response.status_code == 201
+    assert "id" in response.json()
+
+def test_list_todos():
+    client.post("/todos/", json={"title": "Listattava"})
+    response = client.get("/todos/")
+    assert response.status_code == 200
+    assert len(response.json()) >= 1
+
+def test_get_todo_by_id():
+    created = client.post("/todos/", json={"title": "Haettava"}).json()
+    response = client.get(f"/todos/{created['id']}")
+    assert response.status_code == 200
+
+def test_get_todo_not_found():
+    response = client.get("/todos/99999")
+    assert response.status_code == 404
+
+def test_update_todo():
+    created = client.post("/todos/", json={"title": "Vanha"}).json()
+    response = client.put(f"/todos/{created['id']}", json={"title": "Uusi"})
+    assert response.status_code == 200
+
+def test_delete_todo():
+    created = client.post("/todos/", json={"title": "Poistettava"}).json()
+    response = client.delete(f"/todos/{created['id']}")
+    assert response.status_code == 204
+```
+
+# Example 2: Blog (two entities with ForeignKey)
+
+NOTE: ForeignKey is imported from sqlalchemy, NOT from sqlalchemy.orm!
+
+## models.py
+
+```python
+from datetime import datetime
+from sqlalchemy import String, Text, DateTime, ForeignKey, create_engine
+from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship, sessionmaker
+
+DATABASE_URL = "sqlite:///./app.db"
+engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
+SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
+
+class Base(DeclarativeBase):
+    pass
+
+class Author(Base):
+    __tablename__ = "authors"
+    id: Mapped[int] = mapped_column(primary_key=True, index=True)
+    name: Mapped[str] = mapped_column(String(255))
+    email: Mapped[str] = mapped_column(String(255), unique=True)
+    bio: Mapped[str | None] = mapped_column(Text, default=None)
+    posts: Mapped[list["Post"]] = relationship(back_populates="author")
+
+class Post(Base):
+    __tablename__ = "posts"
+    id: Mapped[int] = mapped_column(primary_key=True, index=True)
+    title: Mapped[str] = mapped_column(String(255))
+    content: Mapped[str] = mapped_column(Text)
+    author_id: Mapped[int] = mapped_column(ForeignKey("authors.id"))
+    published_at: Mapped[datetime | None] = mapped_column(DateTime, default=None)
+    status: Mapped[str] = mapped_column(String(20), default="draft")
+    author: Mapped["Author"] = relationship(back_populates="posts")
+
+Base.metadata.create_all(bind=engine)
+```
+
+## schemas.py
+
+```python
+from datetime import datetime
+from pydantic import BaseModel, ConfigDict
+
+class AuthorCreate(BaseModel):
+    name: str
+    email: str
+    bio: str | None = None
+
+class AuthorResponse(AuthorCreate):
+    id: int
+    model_config = ConfigDict(from_attributes=True)
+
+class PostCreate(BaseModel):
+    title: str
+    content: str
+    author_id: int
+    published_at: datetime | None = None
+    status: str = "draft"
+
+class PostResponse(PostCreate):
+    id: int
+    model_config = ConfigDict(from_attributes=True)
+```
+
+## test_main.py — 6 tests per entity, create parent FIRST for child tests
+
+```python
+client = TestClient(app)  # same setup as above
+
+def _create_author(name="Kirjailija", email=None):
+    if email is None:
+        email = f"{name.lower().replace(' ', '.')}@example.com"
+    return client.post("/authors/", json={"name": name, "email": email}).json()
+
+def test_create_author():
+    response = client.post("/authors/", json={"name": "Aleksis Kivi", "email": "aleksis@example.com"})
+    assert response.status_code == 201
+
+def test_list_authors():
+    _create_author("Minna Canth", "minna@example.com")
+    response = client.get("/authors/")
+    assert response.status_code == 200
+    assert len(response.json()) >= 1
+
+# ... (same pattern: get_by_id, not_found, update, delete)
+
+def test_create_post():
+    author = _create_author("Tove Jansson", "tove@example.com")
+    response = client.post("/posts/", json={"title": "Artikkeli", "content": "Sisältö", "author_id": author["id"]})
+    assert response.status_code == 201
+
+def test_update_post():
+    author = _create_author("Joel Lehtonen", "joel@example.com")
+    created = client.post("/posts/", json={"title": "Vanha", "content": "Teksti", "author_id": author["id"]}).json()
+    response = client.put(f"/posts/{created['id']}", json={"title": "Uusi", "content": "Muokattu", "author_id": author["id"]})
+    assert response.status_code == 200
+
+def test_delete_post():
+    author = _create_author("Aino Kallas", "aino@example.com")
+    created = client.post("/posts/", json={"title": "Poistettava", "content": "Poistetaan", "author_id": author["id"]}).json()
+    response = client.delete(f"/posts/{created['id']}")
+    assert response.status_code == 204
+```
--- a/kipina-codebench/golden-examples/todo-go.md
+++ b/kipina-codebench/golden-examples/todo-go.md
@@ -0,0 +1,325 @@
+# Todo — reference implementation (Go + Chi + SQLite)
+
+This is a complete example. Generate equivalent structure for the given project.
+Use ONLY the fields from the JSON spec — do not add extras.
+
+## go.mod
+
+Chi v5 router, modernc.org/sqlite (pure Go, no CGO).
+
+```
+module todo-go
+
+go 1.23.0
+
+toolchain go1.23.12
+
+require (
+	github.com/go-chi/chi/v5 v5.2.1
+	modernc.org/sqlite v1.37.1
+)
+
+require (
+	github.com/dustin/go-humanize v1.0.1 // indirect
+	github.com/google/uuid v1.6.0 // indirect
+	github.com/mattn/go-isatty v0.0.20 // indirect
+	github.com/ncruces/go-strftime v0.1.9 // indirect
+	github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
+	golang.org/x/exp v0.0.0-20250408133849-7e4ce0ab07d0 // indirect
+	golang.org/x/sys v0.33.0 // indirect
+	modernc.org/libc v1.65.7 // indirect
+	modernc.org/mathutil v1.7.1 // indirect
+	modernc.org/memory v1.11.0 // indirect
+)
+```
+
+## models.go
+
+Data structs: Todo (full row), CreateTodo (POST), UpdateTodo (PUT, all fields optional pointers).
+
+```go
+package main
+
+// Todo represents a task with priority and status tracking.
+type Todo struct {
+	ID          int64   `json:"id"`
+	Title       string  `json:"title"`
+	Description *string `json:"description,omitempty"`
+	DueDate     *string `json:"due_date,omitempty"`
+	Priority    int64   `json:"priority"`
+	Status      string  `json:"status"`
+}
+
+// CreateTodo is the request body for creating a new todo.
+type CreateTodo struct {
+	Title       string  `json:"title"`
+	Description *string `json:"description,omitempty"`
+	DueDate     *string `json:"due_date,omitempty"`
+	Priority    *int64  `json:"priority,omitempty"`
+	Status      *string `json:"status,omitempty"`
+}
+
+// UpdateTodo is the request body for updating an existing todo.
+type UpdateTodo struct {
+	Title       *string `json:"title,omitempty"`
+	Description *string `json:"description,omitempty"`
+	DueDate     *string `json:"due_date,omitempty"`
+	Priority    *int64  `json:"priority,omitempty"`
+	Status      *string `json:"status,omitempty"`
+}
+```
+
+## handlers.go
+
+CRUD handlers as closures taking *sql.DB. Key patterns: INSERT RETURNING, sql.ErrNoRows for 404, RowsAffected for delete.
+
+```go
+package main
+
+import (
+	"database/sql"
+	"encoding/json"
+	"net/http"
+	"strconv"
+	"github.com/go-chi/chi/v5"
+)
+
+// POST — decode JSON, defaults with nil-check, INSERT RETURNING, StatusCreated.
+func createTodo(db *sql.DB) http.HandlerFunc {
+	return func(w http.ResponseWriter, r *http.Request) {
+		var input CreateTodo
+		if err := json.NewDecoder(r.Body).Decode(&input); err != nil {
+			http.Error(w, err.Error(), http.StatusBadRequest); return
+		}
+		priority := int64(1)
+		if input.Priority != nil { priority = *input.Priority }
+		status := "pending"
+		if input.Status != nil { status = *input.Status }
+		var todo Todo
+		err := db.QueryRow(
+			`INSERT INTO todos (title, description, due_date, priority, status)
+			 VALUES (?, ?, ?, ?, ?) RETURNING id, title, description, due_date, priority, status`,
+			input.Title, input.Description, input.DueDate, priority, status,
+		).Scan(&todo.ID, &todo.Title, &todo.Description, &todo.DueDate, &todo.Priority, &todo.Status)
+		if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError); return }
+		w.Header().Set("Content-Type", "application/json")
+		w.WriteHeader(http.StatusCreated)
+		json.NewEncoder(w).Encode(todo)
+	}
+}
+
+// GET list — db.Query + rows.Scan loop, empty slice not nil.
+func listTodos(db *sql.DB) http.HandlerFunc {
+	return func(w http.ResponseWriter, r *http.Request) {
+		rows, err := db.Query("SELECT id, title, description, due_date, priority, status FROM todos")
+		if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError); return }
+		defer rows.Close()
+		todos := []Todo{}
+		for rows.Next() {
+			var t Todo
+			rows.Scan(&t.ID, &t.Title, &t.Description, &t.DueDate, &t.Priority, &t.Status)
+			todos = append(todos, t)
+		}
+		w.Header().Set("Content-Type", "application/json")
+		json.NewEncoder(w).Encode(todos)
+	}
+}
+
+// GET by id — QueryRow + sql.ErrNoRows → 404.
+func getTodo(db *sql.DB) http.HandlerFunc {
+	return func(w http.ResponseWriter, r *http.Request) {
+		id, _ := strconv.ParseInt(chi.URLParam(r, "id"), 10, 64)
+		var todo Todo
+		err := db.QueryRow(
+			"SELECT id, title, description, due_date, priority, status FROM todos WHERE id = ?", id,
+		).Scan(&todo.ID, &todo.Title, &todo.Description, &todo.DueDate, &todo.Priority, &todo.Status)
+		if err == sql.ErrNoRows { http.Error(w, "not found", http.StatusNotFound); return }
+		if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError); return }
+		w.Header().Set("Content-Type", "application/json")
+		json.NewEncoder(w).Encode(todo)
+	}
+}
+
+// PUT — fetch existing, merge with input nil-checks, UPDATE RETURNING.
+func updateTodo(db *sql.DB) http.HandlerFunc {
+	return func(w http.ResponseWriter, r *http.Request) {
+		id, _ := strconv.ParseInt(chi.URLParam(r, "id"), 10, 64)
+		var existing Todo
+		err := db.QueryRow("SELECT id, title, description, due_date, priority, status FROM todos WHERE id = ?", id,
+		).Scan(&existing.ID, &existing.Title, &existing.Description, &existing.DueDate, &existing.Priority, &existing.Status)
+		if err == sql.ErrNoRows { http.Error(w, "not found", http.StatusNotFound); return }
+		if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError); return }
+		var input UpdateTodo
+		if err := json.NewDecoder(r.Body).Decode(&input); err != nil {
+			http.Error(w, err.Error(), http.StatusBadRequest); return
+		}
+		if input.Title != nil { existing.Title = *input.Title }
+		if input.Description != nil { existing.Description = input.Description }
+		if input.DueDate != nil { existing.DueDate = input.DueDate }
+		if input.Priority != nil { existing.Priority = *input.Priority }
+		if input.Status != nil { existing.Status = *input.Status }
+		var updated Todo
+		err = db.QueryRow(
+			`UPDATE todos SET title=?, description=?, due_date=?, priority=?, status=? WHERE id=?
+			 RETURNING id, title, description, due_date, priority, status`,
+			existing.Title, existing.Description, existing.DueDate, existing.Priority, existing.Status, id,
+		).Scan(&updated.ID, &updated.Title, &updated.Description, &updated.DueDate, &updated.Priority, &updated.Status)
+		if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError); return }
+		w.Header().Set("Content-Type", "application/json")
+		json.NewEncoder(w).Encode(updated)
+	}
+}
+
+// DELETE — Exec + RowsAffected == 0 → 404, else 204.
+func deleteTodo(db *sql.DB) http.HandlerFunc {
+	return func(w http.ResponseWriter, r *http.Request) {
+		id, _ := strconv.ParseInt(chi.URLParam(r, "id"), 10, 64)
+		result, err := db.Exec("DELETE FROM todos WHERE id = ?", id)
+		if err != nil { http.Error(w, err.Error(), http.StatusInternalServerError); return }
+		rows, _ := result.RowsAffected()
+		if rows == 0 { http.Error(w, "not found", http.StatusNotFound); return }
+		w.WriteHeader(http.StatusNoContent)
+	}
+}
+```
+
+## main.go
+
+Entry point: SQLite connection, table init, Chi router on port 3000.
+
+```go
+package main
+
+import (
+	"database/sql"
+	"fmt"
+	"log"
+	"net/http"
+
+	"github.com/go-chi/chi/v5"
+	_ "modernc.org/sqlite"
+)
+
+// InitDB creates tables if they don't exist.
+func InitDB(db *sql.DB) {
+	_, err := db.Exec(`CREATE TABLE IF NOT EXISTS todos (
+		id          INTEGER PRIMARY KEY AUTOINCREMENT,
+		title       TEXT NOT NULL,
+		description TEXT,
+		due_date    TEXT,
+		priority    INTEGER NOT NULL DEFAULT 1,
+		status      TEXT NOT NULL DEFAULT 'pending'
+	)`)
+	if err != nil {
+		log.Fatal(err)
+	}
+}
+
+// NewRouter creates a chi router with all routes.
+func NewRouter(db *sql.DB) http.Handler {
+	r := chi.NewRouter()
+	r.Post("/todos", createTodo(db))
+	r.Get("/todos", listTodos(db))
+	r.Get("/todos/{id}", getTodo(db))
+	r.Put("/todos/{id}", updateTodo(db))
+	r.Delete("/todos/{id}", deleteTodo(db))
+	return r
+}
+
+func main() {
+	db, err := sql.Open("sqlite", "file:app.db?mode=rwc")
+	if err != nil {
+		log.Fatal(err)
+	}
+	defer db.Close()
+	InitDB(db)
+
+	fmt.Println("Server running: http://127.0.0.1:3000")
+	log.Fatal(http.ListenAndServe("127.0.0.1:3000", NewRouter(db)))
+}
+```
+
+## handlers_test.go
+
+Integration tests: setupTestServer with httptest.NewServer + :memory: SQLite, unique data per test.
+
+```go
+package main
+
+import (
+	"database/sql"
+	"encoding/json"
+	"fmt"
+	"net/http"
+	"net/http/httptest"
+	"strings"
+	"testing"
+	_ "modernc.org/sqlite"
+)
+
+func setupTestServer(t *testing.T) (*httptest.Server, *sql.DB) {
+	t.Helper()
+	db, err := sql.Open("sqlite", ":memory:")
+	if err != nil { t.Fatal(err) }
+	InitDB(db)
+	return httptest.NewServer(NewRouter(db)), db
+}
+
+func TestCreateTodo(t *testing.T) {
+	ts, db := setupTestServer(t)
+	defer ts.Close()
+	defer db.Close()
+	resp, err := http.Post(ts.URL+"/todos", "application/json",
+		strings.NewReader(`{"title":"Buy groceries","priority":2}`))
+	if err != nil { t.Fatal(err) }
+	defer resp.Body.Close()
+	if resp.StatusCode != http.StatusCreated { t.Fatalf("expected 201, got %d", resp.StatusCode) }
+	var body map[string]interface{}
+	json.NewDecoder(resp.Body).Decode(&body)
+	if body["title"] != "Buy groceries" { t.Fatalf("expected 'Buy groceries', got %v", body["title"]) }
+	if body["id"] == nil { t.Fatal("expected id") }
+}
+
+func TestGetTodoByID(t *testing.T) {
+	ts, db := setupTestServer(t)
+	defer ts.Close()
+	defer db.Close()
+	resp, _ := http.Post(ts.URL+"/todos", "application/json",
+		strings.NewReader(`{"title":"Fetchable task"}`))
+	var created map[string]interface{}
+	json.NewDecoder(resp.Body).Decode(&created)
+	resp.Body.Close()
+	id := created["id"].(float64)
+	resp, _ = http.Get(ts.URL + "/todos/" + fmt.Sprintf("%.0f", id))
+	defer resp.Body.Close()
+	if resp.StatusCode != http.StatusOK { t.Fatalf("expected 200, got %d", resp.StatusCode) }
+}
+
+func TestGetTodoNotFound(t *testing.T) {
+	ts, db := setupTestServer(t)
+	defer ts.Close()
+	defer db.Close()
+	resp, _ := http.Get(ts.URL + "/todos/99999")
+	defer resp.Body.Close()
+	if resp.StatusCode != http.StatusNotFound { t.Fatalf("expected 404, got %d", resp.StatusCode) }
+}
+
+func TestDeleteTodo(t *testing.T) {
+	ts, db := setupTestServer(t)
+	defer ts.Close()
+	defer db.Close()
+	resp, _ := http.Post(ts.URL+"/todos", "application/json",
+		strings.NewReader(`{"title":"Deletable task"}`))
+	var created map[string]interface{}
+	json.NewDecoder(resp.Body).Decode(&created)
+	resp.Body.Close()
+	id := created["id"].(float64)
+	req, _ := http.NewRequest(http.MethodDelete, ts.URL+"/todos/"+fmt.Sprintf("%.0f", id), nil)
+	resp, _ = http.DefaultClient.Do(req)
+	defer resp.Body.Close()
+	if resp.StatusCode != http.StatusNoContent { t.Fatalf("expected 204, got %d", resp.StatusCode) }
+	resp, _ = http.Get(ts.URL + "/todos/" + fmt.Sprintf("%.0f", id))
+	defer resp.Body.Close()
+	if resp.StatusCode != http.StatusNotFound { t.Fatalf("expected 404 after delete, got %d", resp.StatusCode) }
+}
+```
--- a/kipina-codebench/golden-examples/todo-go/go.mod
+++ b/kipina-codebench/golden-examples/todo-go/go.mod
@@ -0,0 +1,23 @@
+module todo-go
+
+go 1.23.0
+
+toolchain go1.23.12
+
+require (
+	github.com/go-chi/chi/v5 v5.2.1
+	modernc.org/sqlite v1.37.1
+)
+
+require (
+	github.com/dustin/go-humanize v1.0.1 // indirect
+	github.com/google/uuid v1.6.0 // indirect
+	github.com/mattn/go-isatty v0.0.20 // indirect
+	github.com/ncruces/go-strftime v0.1.9 // indirect
+	github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec // indirect
+	golang.org/x/exp v0.0.0-20250408133849-7e4ce0ab07d0 // indirect
+	golang.org/x/sys v0.33.0 // indirect
+	modernc.org/libc v1.65.7 // indirect
+	modernc.org/mathutil v1.7.1 // indirect
+	modernc.org/memory v1.11.0 // indirect
+)
--- a/kipina-codebench/golden-examples/todo-go/go.sum
+++ b/kipina-codebench/golden-examples/todo-go/go.sum
@@ -0,0 +1,49 @@
+github.com/dustin/go-humanize v1.0.1 h1:GzkhY7T5VNhEkwH0PVJgjz+fX1rhBrR7pRT3mDkpeCY=
+github.com/dustin/go-humanize v1.0.1/go.mod h1:Mu1zIs6XwVuF/gI1OepvI0qD18qycQx+mFykh5fBlto=
+github.com/go-chi/chi/v5 v5.2.1 h1:KOIHODQj58PmL80G2Eak4WdvUzjSJSm0vG72crDCqb8=
+github.com/go-chi/chi/v5 v5.2.1/go.mod h1:L2yAIGWB3H+phAw1NxKwWM+7eUH/lU8pOMm5hHcoops=
+github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e h1:ijClszYn+mADRFY17kjQEVQ1XRhq2/JR1M3sGqeJoxs=
+github.com/google/pprof v0.0.0-20250317173921-a4b03ec1a45e/go.mod h1:boTsfXsheKC2y+lKOCMpSfarhxDeIzfZG1jqGcPl3cA=
+github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
+github.com/google/uuid v1.6.0/go.mod h1:TIyPZe4MgqvfeYDBFedMoGGpEw/LqOeaOT+nhxU+yHo=
+github.com/mattn/go-isatty v0.0.20 h1:xfD0iDuEKnDkl03q4limB+vH+GxLEtL/jb4xVJSWWEY=
+github.com/mattn/go-isatty v0.0.20/go.mod h1:W+V8PltTTMOvKvAeJH7IuucS94S2C6jfK/D7dTCTo3Y=
+github.com/ncruces/go-strftime v0.1.9 h1:bY0MQC28UADQmHmaF5dgpLmImcShSi2kHU9XLdhx/f4=
+github.com/ncruces/go-strftime v0.1.9/go.mod h1:Fwc5htZGVVkseilnfgOVb9mKy6w1naJmn9CehxcKcls=
+github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec h1:W09IVJc94icq4NjY3clb7Lk8O1qJ8BdBEF8z0ibU0rE=
+github.com/remyoudompheng/bigfft v0.0.0-20230129092748-24d4a6f8daec/go.mod h1:qqbHyh8v60DhA7CoWK5oRCqLrMHRGoxYCSS9EjAz6Eo=
+golang.org/x/exp v0.0.0-20250408133849-7e4ce0ab07d0 h1:R84qjqJb5nVJMxqWYb3np9L5ZsaDtB+a39EqjV0JSUM=
+golang.org/x/exp v0.0.0-20250408133849-7e4ce0ab07d0/go.mod h1:S9Xr4PYopiDyqSyp5NjCrhFrqg6A5zA2E/iPHPhqnS8=
+golang.org/x/mod v0.24.0 h1:ZfthKaKaT4NrhGVZHO1/WDTwGES4De8KtWO0SIbNJMU=
+golang.org/x/mod v0.24.0/go.mod h1:IXM97Txy2VM4PJ3gI61r1YEk/gAj6zAHN3AdZt6S9Ww=
+golang.org/x/sync v0.14.0 h1:woo0S4Yywslg6hp4eUFjTVOyKt0RookbpAHG4c1HmhQ=
+golang.org/x/sync v0.14.0/go.mod h1:1dzgHSNfp02xaA81J2MS99Qcpr2w7fw1gpm99rleRqA=
+golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
+golang.org/x/sys v0.33.0 h1:q3i8TbbEz+JRD9ywIRlyRAQbM0qF7hu24q3teo2hbuw=
+golang.org/x/sys v0.33.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
+golang.org/x/tools v0.33.0 h1:4qz2S3zmRxbGIhDIAgjxvFutSvH5EfnsYrRBj0UI0bc=
+golang.org/x/tools v0.33.0/go.mod h1:CIJMaWEY88juyUfo7UbgPqbC8rU2OqfAV1h2Qp0oMYI=
+modernc.org/cc/v4 v4.26.1 h1:+X5NtzVBn0KgsBCBe+xkDC7twLb/jNVj9FPgiwSQO3s=
+modernc.org/cc/v4 v4.26.1/go.mod h1:uVtb5OGqUKpoLWhqwNQo/8LwvoiEBLvZXIQ/SmO6mL0=
+modernc.org/ccgo/v4 v4.28.0 h1:rjznn6WWehKq7dG4JtLRKxb52Ecv8OUGah8+Z/SfpNU=
+modernc.org/ccgo/v4 v4.28.0/go.mod h1:JygV3+9AV6SmPhDasu4JgquwU81XAKLd3OKTUDNOiKE=
+modernc.org/fileutil v1.3.1 h1:8vq5fe7jdtEvoCf3Zf9Nm0Q05sH6kGx0Op2CPx1wTC8=
+modernc.org/fileutil v1.3.1/go.mod h1:HxmghZSZVAz/LXcMNwZPA/DRrQZEVP9VX0V4LQGQFOc=
+modernc.org/gc/v2 v2.6.5 h1:nyqdV8q46KvTpZlsw66kWqwXRHdjIlJOhG6kxiV/9xI=
+modernc.org/gc/v2 v2.6.5/go.mod h1:YgIahr1ypgfe7chRuJi2gD7DBQiKSLMPgBQe9oIiito=
+modernc.org/libc v1.65.7 h1:Ia9Z4yzZtWNtUIuiPuQ7Qf7kxYrxP1/jeHZzG8bFu00=
+modernc.org/libc v1.65.7/go.mod h1:011EQibzzio/VX3ygj1qGFt5kMjP0lHb0qCW5/D/pQU=
+modernc.org/mathutil v1.7.1 h1:GCZVGXdaN8gTqB1Mf/usp1Y/hSqgI2vAGGP4jZMCxOU=
+modernc.org/mathutil v1.7.1/go.mod h1:4p5IwJITfppl0G4sUEDtCr4DthTaT47/N3aT6MhfgJg=
+modernc.org/memory v1.11.0 h1:o4QC8aMQzmcwCK3t3Ux/ZHmwFPzE6hf2Y5LbkRs+hbI=
+modernc.org/memory v1.11.0/go.mod h1:/JP4VbVC+K5sU2wZi9bHoq2MAkCnrt2r98UGeSK7Mjw=
+modernc.org/opt v0.1.4 h1:2kNGMRiUjrp4LcaPuLY2PzUfqM/w9N23quVwhKt5Qm8=
+modernc.org/opt v0.1.4/go.mod h1:03fq9lsNfvkYSfxrfUhZCWPk1lm4cq4N+Bh//bEtgns=
+modernc.org/sortutil v1.2.1 h1:+xyoGf15mM3NMlPDnFqrteY07klSFxLElE2PVuWIJ7w=
+modernc.org/sortutil v1.2.1/go.mod h1:7ZI3a3REbai7gzCLcotuw9AC4VZVpYMjDzETGsSMqJE=
+modernc.org/sqlite v1.37.1 h1:EgHJK/FPoqC+q2YBXg7fUmES37pCHFc97sI7zSayBEs=
+modernc.org/sqlite v1.37.1/go.mod h1:XwdRtsE1MpiBcL54+MbKcaDvcuej+IYSMfLN6gSKV8g=
+modernc.org/strutil v1.2.1 h1:UneZBkQA+DX2Rp35KcM69cSsNES9ly8mQWD71HKlOA0=
+modernc.org/strutil v1.2.1/go.mod h1:EHkiggD70koQxjVdSBM3JKM7k6L0FbGE5eymy9i3B9A=
+modernc.org/token v1.1.0 h1:Xl7Ap9dKaEs5kLoOQeQmPWevfnk/DM5qcLcYlA8ys6Y=
+modernc.org/token v1.1.0/go.mod h1:UGzOrNV1mAFSEB63lOFHIpNRUVMvYTc6yu1SMY/XTDM=
--- a/kipina-codebench/golden-examples/todo-go/handlers.go
+++ b/kipina-codebench/golden-examples/todo-go/handlers.go
@@ -0,0 +1,155 @@
+package main
+
+import (
+	"database/sql"
+	"encoding/json"
+	"net/http"
+	"strconv"
+
+	"github.com/go-chi/chi/v5"
+)
+
+func createTodo(db *sql.DB) http.HandlerFunc {
+	return func(w http.ResponseWriter, r *http.Request) {
+		var input CreateTodo
+		if err := json.NewDecoder(r.Body).Decode(&input); err != nil {
+			http.Error(w, err.Error(), http.StatusBadRequest)
+			return
+		}
+		priority := int64(1)
+		if input.Priority != nil {
+			priority = *input.Priority
+		}
+		status := "pending"
+		if input.Status != nil {
+			status = *input.Status
+		}
+		var todo Todo
+		err := db.QueryRow(
+			`INSERT INTO todos (title, description, due_date, priority, status)
+			 VALUES (?, ?, ?, ?, ?)
+			 RETURNING id, title, description, due_date, priority, status`,
+			input.Title, input.Description, input.DueDate, priority, status,
+		).Scan(&todo.ID, &todo.Title, &todo.Description, &todo.DueDate, &todo.Priority, &todo.Status)
+		if err != nil {
+			http.Error(w, err.Error(), http.StatusInternalServerError)
+			return
+		}
+		w.Header().Set("Content-Type", "application/json")
+		w.WriteHeader(http.StatusCreated)
+		json.NewEncoder(w).Encode(todo)
+	}
+}
+
+func listTodos(db *sql.DB) http.HandlerFunc {
+	return func(w http.ResponseWriter, r *http.Request) {
+		rows, err := db.Query("SELECT id, title, description, due_date, priority, status FROM todos")
+		if err != nil {
+			http.Error(w, err.Error(), http.StatusInternalServerError)
+			return
+		}
+		defer rows.Close()
+		var todos []Todo
+		for rows.Next() {
+			var t Todo
+			if err := rows.Scan(&t.ID, &t.Title, &t.Description, &t.DueDate, &t.Priority, &t.Status); err != nil {
+				http.Error(w, err.Error(), http.StatusInternalServerError)
+				return
+			}
+			todos = append(todos, t)
+		}
+		if todos == nil {
+			todos = []Todo{}
+		}
+		w.Header().Set("Content-Type", "application/json")
+		json.NewEncoder(w).Encode(todos)
+	}
+}
+
+func getTodo(db *sql.DB) http.HandlerFunc {
+	return func(w http.ResponseWriter, r *http.Request) {
+		id, _ := strconv.ParseInt(chi.URLParam(r, "id"), 10, 64)
+		var todo Todo
+		err := db.QueryRow(
+			"SELECT id, title, description, due_date, priority, status FROM todos WHERE id = ?", id,
+		).Scan(&todo.ID, &todo.Title, &todo.Description, &todo.DueDate, &todo.Priority, &todo.Status)
+		if err == sql.ErrNoRows {
+			http.Error(w, "not found", http.StatusNotFound)
+			return
+		}
+		if err != nil {
+			http.Error(w, err.Error(), http.StatusInternalServerError)
+			return
+		}
+		w.Header().Set("Content-Type", "application/json")
+		json.NewEncoder(w).Encode(todo)
+	}
+}
+
+func updateTodo(db *sql.DB) http.HandlerFunc {
+	return func(w http.ResponseWriter, r *http.Request) {
+		id, _ := strconv.ParseInt(chi.URLParam(r, "id"), 10, 64)
+		var existing Todo
+		err := db.QueryRow(
+			"SELECT id, title, description, due_date, priority, status FROM todos WHERE id = ?", id,
+		).Scan(&existing.ID, &existing.Title, &existing.Description, &existing.DueDate, &existing.Priority, &existing.Status)
+		if err == sql.ErrNoRows {
+			http.Error(w, "not found", http.StatusNotFound)
+			return
+		}
+		if err != nil {
+			http.Error(w, err.Error(), http.StatusInternalServerError)
+			return
+		}
+		var input UpdateTodo
+		if err := json.NewDecoder(r.Body).Decode(&input); err != nil {
+			http.Error(w, err.Error(), http.StatusBadRequest)
+			return
+		}
+		if input.Title != nil {
+			existing.Title = *input.Title
+		}
+		if input.Description != nil {
+			existing.Description = input.Description
+		}
+		if input.DueDate != nil {
+			existing.DueDate = input.DueDate
+		}
+		if input.Priority != nil {
+			existing.Priority = *input.Priority
+		}
+		if input.Status != nil {
+			existing.Status = *input.Status
+		}
+		var updated Todo
+		err = db.QueryRow(
+			`UPDATE todos SET title = ?, description = ?, due_date = ?, priority = ?, status = ?
+			 WHERE id = ?
+			 RETURNING id, title, description, due_date, priority, status`,
+			existing.Title, existing.Description, existing.DueDate, existing.Priority, existing.Status, id,
+		).Scan(&updated.ID, &updated.Title, &updated.Description, &updated.DueDate, &updated.Priority, &updated.Status)
+		if err != nil {
+			http.Error(w, err.Error(), http.StatusInternalServerError)
+			return
+		}
+		w.Header().Set("Content-Type", "application/json")
+		json.NewEncoder(w).Encode(updated)
+	}
+}
+
+func deleteTodo(db *sql.DB) http.HandlerFunc {
+	return func(w http.ResponseWriter, r *http.Request) {
+		id, _ := strconv.ParseInt(chi.URLParam(r, "id"), 10, 64)
+		result, err := db.Exec("DELETE FROM todos WHERE id = ?", id)
+		if err != nil {
+			http.Error(w, err.Error(), http.StatusInternalServerError)
+			return
+		}
+		rows, _ := result.RowsAffected()
+		if rows == 0 {
+			http.Error(w, "not found", http.StatusNotFound)
+			return
+		}
+		w.WriteHeader(http.StatusNoContent)
+	}
+}
--- a/kipina-codebench/golden-examples/todo-go/handlers_test.go
+++ b/kipina-codebench/golden-examples/todo-go/handlers_test.go
@@ -0,0 +1,171 @@
+package main
+
+import (
+	"database/sql"
+	"encoding/json"
+	"fmt"
+	"net/http"
+	"net/http/httptest"
+	"strings"
+	"testing"
+
+	_ "modernc.org/sqlite"
+)
+
+func setupTestServer(t *testing.T) (*httptest.Server, *sql.DB) {
+	t.Helper()
+	db, err := sql.Open("sqlite", ":memory:")
+	if err != nil {
+		t.Fatal(err)
+	}
+	InitDB(db)
+	return httptest.NewServer(NewRouter(db)), db
+}
+
+func TestCreateTodo(t *testing.T) {
+	ts, db := setupTestServer(t)
+	defer ts.Close()
+	defer db.Close()
+
+	resp, err := http.Post(ts.URL+"/todos", "application/json",
+		strings.NewReader(`{"title":"Buy groceries","priority":2}`))
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer resp.Body.Close()
+	if resp.StatusCode != http.StatusCreated {
+		t.Fatalf("expected 201, got %d", resp.StatusCode)
+	}
+	var body map[string]interface{}
+	json.NewDecoder(resp.Body).Decode(&body)
+	if body["title"] != "Buy groceries" {
+		t.Fatalf("expected title 'Buy groceries', got %v", body["title"])
+	}
+	if body["id"] == nil {
+		t.Fatal("expected id to be present")
+	}
+}
+
+func TestListTodos(t *testing.T) {
+	ts, db := setupTestServer(t)
+	defer ts.Close()
+	defer db.Close()
+
+	http.Post(ts.URL+"/todos", "application/json",
+		strings.NewReader(`{"title":"Listable task"}`))
+
+	resp, err := http.Get(ts.URL + "/todos")
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer resp.Body.Close()
+	if resp.StatusCode != http.StatusOK {
+		t.Fatalf("expected 200, got %d", resp.StatusCode)
+	}
+	var body []map[string]interface{}
+	json.NewDecoder(resp.Body).Decode(&body)
+	if len(body) < 1 {
+		t.Fatal("expected at least 1 todo")
+	}
+}
+
+func TestGetTodoByID(t *testing.T) {
+	ts, db := setupTestServer(t)
+	defer ts.Close()
+	defer db.Close()
+
+	resp, _ := http.Post(ts.URL+"/todos", "application/json",
+		strings.NewReader(`{"title":"Fetchable task"}`))
+	var created map[string]interface{}
+	json.NewDecoder(resp.Body).Decode(&created)
+	resp.Body.Close()
+
+	id := created["id"].(float64)
+	resp, err := http.Get(ts.URL + "/todos/" + fmt.Sprintf("%.0f", id))
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer resp.Body.Close()
+	if resp.StatusCode != http.StatusOK {
+		t.Fatalf("expected 200, got %d", resp.StatusCode)
+	}
+	var body map[string]interface{}
+	json.NewDecoder(resp.Body).Decode(&body)
+	if body["id"] != id {
+		t.Fatalf("expected id %.0f, got %v", id, body["id"])
+	}
+}
+
+func TestGetTodoNotFound(t *testing.T) {
+	ts, db := setupTestServer(t)
+	defer ts.Close()
+	defer db.Close()
+
+	resp, err := http.Get(ts.URL + "/todos/99999")
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer resp.Body.Close()
+	if resp.StatusCode != http.StatusNotFound {
+		t.Fatalf("expected 404, got %d", resp.StatusCode)
+	}
+}
+
+func TestUpdateTodo(t *testing.T) {
+	ts, db := setupTestServer(t)
+	defer ts.Close()
+	defer db.Close()
+
+	resp, _ := http.Post(ts.URL+"/todos", "application/json",
+		strings.NewReader(`{"title":"Old title"}`))
+	var created map[string]interface{}
+	json.NewDecoder(resp.Body).Decode(&created)
+	resp.Body.Close()
+
+	id := created["id"].(float64)
+	req, _ := http.NewRequest(http.MethodPut, ts.URL+"/todos/"+fmt.Sprintf("%.0f", id),
+		strings.NewReader(`{"title":"New title"}`))
+	req.Header.Set("Content-Type", "application/json")
+	resp, err := http.DefaultClient.Do(req)
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer resp.Body.Close()
+	if resp.StatusCode != http.StatusOK {
+		t.Fatalf("expected 200, got %d", resp.StatusCode)
+	}
+	var body map[string]interface{}
+	json.NewDecoder(resp.Body).Decode(&body)
+	if body["title"] != "New title" {
+		t.Fatalf("expected 'New title', got %v", body["title"])
+	}
+}
+
+func TestDeleteTodo(t *testing.T) {
+	ts, db := setupTestServer(t)
+	defer ts.Close()
+	defer db.Close()
+
+	resp, _ := http.Post(ts.URL+"/todos", "application/json",
+		strings.NewReader(`{"title":"Deletable task"}`))
+	var created map[string]interface{}
+	json.NewDecoder(resp.Body).Decode(&created)
+	resp.Body.Close()
+
+	id := created["id"].(float64)
+	req, _ := http.NewRequest(http.MethodDelete, ts.URL+"/todos/"+fmt.Sprintf("%.0f", id), nil)
+	resp, err := http.DefaultClient.Do(req)
+	if err != nil {
+		t.Fatal(err)
+	}
+	defer resp.Body.Close()
+	if resp.StatusCode != http.StatusNoContent {
+		t.Fatalf("expected 204, got %d", resp.StatusCode)
+	}
+
+	resp, _ = http.Get(ts.URL + "/todos/" + fmt.Sprintf("%.0f", id))
+	defer resp.Body.Close()
+	if resp.StatusCode != http.StatusNotFound {
+		t.Fatalf("expected 404 after delete, got %d", resp.StatusCode)
+	}
+}
--- a/kipina-codebench/golden-examples/todo-go/main.go
+++ b/kipina-codebench/golden-examples/todo-go/main.go
@@ -0,0 +1,49 @@
+package main
+
+import (
+	"database/sql"
+	"fmt"
+	"log"
+	"net/http"
+
+	"github.com/go-chi/chi/v5"
+	_ "modernc.org/sqlite"
+)
+
+// InitDB creates tables if they don't exist.
+func InitDB(db *sql.DB) {
+	_, err := db.Exec(`CREATE TABLE IF NOT EXISTS todos (
+		id          INTEGER PRIMARY KEY AUTOINCREMENT,
+		title       TEXT NOT NULL,
+		description TEXT,
+		due_date    TEXT,
+		priority    INTEGER NOT NULL DEFAULT 1,
+		status      TEXT NOT NULL DEFAULT 'pending'
+	)`)
+	if err != nil {
+		log.Fatal(err)
+	}
+}
+
+// NewRouter creates a chi router with all routes.
+func NewRouter(db *sql.DB) http.Handler {
+	r := chi.NewRouter()
+	r.Post("/todos", createTodo(db))
+	r.Get("/todos", listTodos(db))
+	r.Get("/todos/{id}", getTodo(db))
+	r.Put("/todos/{id}", updateTodo(db))
+	r.Delete("/todos/{id}", deleteTodo(db))
+	return r
+}
+
+func main() {
+	db, err := sql.Open("sqlite", "file:app.db?mode=rwc")
+	if err != nil {
+		log.Fatal(err)
+	}
+	defer db.Close()
+	InitDB(db)
+
+	fmt.Println("Server running: http://127.0.0.1:3000")
+	log.Fatal(http.ListenAndServe("127.0.0.1:3000", NewRouter(db)))
+}
--- a/kipina-codebench/golden-examples/todo-go/models.go
+++ b/kipina-codebench/golden-examples/todo-go/models.go
@@ -0,0 +1,29 @@
+package main
+
+// Todo represents a task with priority and status tracking.
+type Todo struct {
+	ID          int64   `json:"id"`
+	Title       string  `json:"title"`
+	Description *string `json:"description,omitempty"`
+	DueDate     *string `json:"due_date,omitempty"`
+	Priority    int64   `json:"priority"`
+	Status      string  `json:"status"`
+}
+
+// CreateTodo is the request body for creating a new todo.
+type CreateTodo struct {
+	Title       string  `json:"title"`
+	Description *string `json:"description,omitempty"`
+	DueDate     *string `json:"due_date,omitempty"`
+	Priority    *int64  `json:"priority,omitempty"`
+	Status      *string `json:"status,omitempty"`
+}
+
+// UpdateTodo is the request body for updating an existing todo.
+type UpdateTodo struct {
+	Title       *string `json:"title,omitempty"`
+	Description *string `json:"description,omitempty"`
+	DueDate     *string `json:"due_date,omitempty"`
+	Priority    *int64  `json:"priority,omitempty"`
+	Status      *string `json:"status,omitempty"`
+}
--- a/kipina-codebench/golden-examples/todo-readme.md
+++ b/kipina-codebench/golden-examples/todo-readme.md
@@ -0,0 +1,217 @@
+# Todo App — FastAPI + SQLAlchemy + SQLite
+
+A simple todo CRUD API. Uses only the fields defined in the spec — no extra fields.
+
+## Project Structure
+
+```
+models.py       # SQLAlchemy 2.0 models
+schemas.py      # Pydantic v2 schemas
+main.py         # FastAPI CRUD endpoints
+test_main.py    # Pytest with TestClient
+```
+
+## models.py
+
+```python
+"""Tietokantamallit — SQLAlchemy 2.0, Mapped-tyypitys, SQLite."""
+
+from datetime import date
+
+from sqlalchemy import String, Text, Date, create_engine
+from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, sessionmaker
+
+DATABASE_URL = "sqlite:///./app.db"
+engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
+SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
+
+
+class Base(DeclarativeBase):
+    pass
+
+
+class Todo(Base):
+    """Tehtävä — otsikko, kuvaus, deadline, prioriteetti ja status."""
+
+    __tablename__ = "todos"
+
+    id: Mapped[int] = mapped_column(primary_key=True, index=True)
+    title: Mapped[str] = mapped_column(String(255))
+    description: Mapped[str | None] = mapped_column(Text, default=None)
+    due_date: Mapped[date | None] = mapped_column(Date, default=None)
+    priority: Mapped[int] = mapped_column(default=1)
+    status: Mapped[str] = mapped_column(String(20), default="pending")
+
+
+Base.metadata.create_all(bind=engine)
+```
+
+## schemas.py
+
+```python
+"""Pydantic v2 -skeemat — Create sisääntulolle, Response vastaukselle."""
+
+from datetime import date
+
+from pydantic import BaseModel, ConfigDict
+
+
+class TodoCreate(BaseModel):
+    """Uuden tehtävän luonti. Pakolliset: title."""
+
+    title: str
+    description: str | None = None
+    due_date: date | None = None
+    priority: int = 1
+    status: str = "pending"
+
+
+class TodoResponse(TodoCreate):
+    """Palautettava tehtävä — sisältää id:n."""
+
+    id: int
+    model_config = ConfigDict(from_attributes=True)
+```
+
+## main.py
+
+```python
+"""FastAPI CRUD — yksi endpoint-setti per entiteetti."""
+
+from fastapi import FastAPI, Depends, HTTPException
+from sqlalchemy.orm import Session
+
+from models import SessionLocal, Todo
+from schemas import TodoCreate, TodoResponse
+
+app = FastAPI()
+
+
+def get_db():
+    """Tietokantasessio per pyyntö."""
+    db = SessionLocal()
+    try:
+        yield db
+    finally:
+        db.close()
+
+
+@app.post("/todos/", response_model=TodoResponse, status_code=201)
+def create_todo(item: TodoCreate, db: Session = Depends(get_db)):
+    db_item = Todo(**item.model_dump())
+    db.add(db_item)
+    db.commit()
+    db.refresh(db_item)
+    return db_item
+
+
+@app.get("/todos/", response_model=list[TodoResponse])
+def list_todos(db: Session = Depends(get_db)):
+    return db.query(Todo).all()
+
+
+@app.get("/todos/{item_id}", response_model=TodoResponse)
+def get_todo(item_id: int, db: Session = Depends(get_db)):
+    item = db.query(Todo).filter(Todo.id == item_id).first()
+    if not item:
+        raise HTTPException(status_code=404, detail="Todo not found")
+    return item
+
+
+@app.put("/todos/{item_id}", response_model=TodoResponse)
+def update_todo(item_id: int, item: TodoCreate, db: Session = Depends(get_db)):
+    db_item = db.query(Todo).filter(Todo.id == item_id).first()
+    if not db_item:
+        raise HTTPException(status_code=404, detail="Todo not found")
+    for key, value in item.model_dump().items():
+        setattr(db_item, key, value)
+    db.commit()
+    db.refresh(db_item)
+    return db_item
+
+
+@app.delete("/todos/{item_id}", status_code=204)
+def delete_todo(item_id: int, db: Session = Depends(get_db)):
+    db_item = db.query(Todo).filter(Todo.id == item_id).first()
+    if not db_item:
+        raise HTTPException(status_code=404, detail="Todo not found")
+    db.delete(db_item)
+    db.commit()
+```
+
+## test_main.py
+
+Exactly 6 tests per entity. Database is shared — use `>= 1` not `== 1` in list tests.
+For child entities with foreign keys: create parent FIRST, then child with parent's id.
+
+```python
+"""Pytest — TestClient, erillinen test.db, uniikki data per testi."""
+
+from fastapi.testclient import TestClient
+from sqlalchemy import create_engine
+from sqlalchemy.orm import sessionmaker
+
+from main import app, get_db
+from models import Base
+
+test_engine = create_engine(
+    "sqlite:///./test.db", connect_args={"check_same_thread": False}
+)
+TestSession = sessionmaker(autocommit=False, autoflush=False, bind=test_engine)
+Base.metadata.create_all(bind=test_engine)
+
+
+def override_get_db():
+    db = TestSession()
+    try:
+        yield db
+    finally:
+        db.close()
+
+
+app.dependency_overrides[get_db] = override_get_db
+client = TestClient(app)
+
+
+def test_create_todo():
+    response = client.post("/todos/", json={"title": "Osta maitoa", "priority": 2})
+    assert response.status_code == 201
+    assert response.json()["title"] == "Osta maitoa"
+    assert "id" in response.json()
+
+
+def test_list_todos():
+    client.post("/todos/", json={"title": "Listattava tehtävä"})
+    response = client.get("/todos/")
+    assert response.status_code == 200
+    assert len(response.json()) >= 1
+
+
+def test_get_todo_by_id():
+    created = client.post("/todos/", json={"title": "Haettava tehtävä"}).json()
+    response = client.get(f"/todos/{created['id']}")
+    assert response.status_code == 200
+    assert response.json()["id"] == created["id"]
+
+
+def test_get_todo_not_found():
+    response = client.get("/todos/99999")
+    assert response.status_code == 404
+
+
+def test_update_todo():
+    created = client.post("/todos/", json={"title": "Vanha otsikko"}).json()
+    response = client.put(
+        f"/todos/{created['id']}", json={"title": "Uusi otsikko"}
+    )
+    assert response.status_code == 200
+    assert response.json()["title"] == "Uusi otsikko"
+
+
+def test_delete_todo():
+    created = client.post("/todos/", json={"title": "Poistettava"}).json()
+    response = client.delete(f"/todos/{created['id']}")
+    assert response.status_code == 204
+    response = client.get(f"/todos/{created['id']}")
+    assert response.status_code == 404
+```
--- a/kipina-codebench/golden-examples/todo-rs.md
+++ b/kipina-codebench/golden-examples/todo-rs.md
@@ -0,0 +1,331 @@
+# Todo — referenssitoteutus (Axum 0.8 + SQLx + SQLite)
+
+Tämä on täydellinen esimerkki. Generoi vastaava rakenne annetulle projektille.
+Käytä VAIN JSON-spekin kenttiä — älä lisää ylimääräisiä.
+
+## Cargo.toml
+
+Axum 0.8, SQLx SQLite-featurella, serde JSON-serialisointiin, tower-http CORS-tukeen.
+
+```toml
+[package]
+name = "todo-rs"
+version = "0.1.0"
+edition = "2024"
+
+[dependencies]
+axum = "0.8"
+tokio = { version = "1", features = ["full"] }
+serde = { version = "1", features = ["derive"] }
+serde_json = "1"
+sqlx = { version = "0.8", features = ["sqlite", "runtime-tokio"] }
+tower-http = { version = "0.6", features = ["cors"] }
+
+[dev-dependencies]
+reqwest = { version = "0.13", default-features = false, features = ["json", "rustls"] }
+tokio = { version = "1", features = ["full", "test-util"] }
+```
+
+## src/models.rs
+
+Serde-rakenteet: `Todo` (FromRow), `CreateTodo` (POST), `UpdateTodo` (PUT, kaikki kentät valinnaisia).
+
+```rust
+//! Tietomallit — Todo, CreateTodo, UpdateTodo serde-rakenteina.
+
+use serde::{Deserialize, Serialize};
+
+/// Tehtävä — otsikko, kuvaus, deadline, prioriteetti ja status.
+#[derive(Debug, Serialize, Deserialize, sqlx::FromRow)]
+pub struct Todo {
+    pub id: i64,
+    pub title: String,
+    pub description: Option<String>,
+    pub due_date: Option<String>,
+    pub priority: i64,
+    pub status: String,
+}
+
+/// Uuden tehtävän luonti. Pakolliset: title.
+#[derive(Debug, Deserialize)]
+pub struct CreateTodo {
+    pub title: String,
+    pub description: Option<String>,
+    pub due_date: Option<String>,
+    pub priority: Option<i64>,
+    pub status: Option<String>,
+}
+
+/// Tehtävän päivitys — kaikki kentät valinnaisia.
+#[derive(Debug, Deserialize)]
+pub struct UpdateTodo {
+    pub title: Option<String>,
+    pub description: Option<String>,
+    pub due_date: Option<String>,
+    pub priority: Option<i64>,
+    pub status: Option<String>,
+}
+```
+
+## src/handlers.rs
+
+CRUD-käsittelijät. Avainpatternit: INSERT RETURNING, fetch_optional+404, rows_affected+204.
+
+```rust
+//! Käsittelijät — CRUD-operaatiot todo-entiteetille.
+
+use axum::extract::{Path, State};
+use axum::http::StatusCode;
+use axum::Json;
+use sqlx::SqlitePool;
+
+use crate::models::{CreateTodo, Todo, UpdateTodo};
+
+/// POST — INSERT RETURNING, oletusarvot unwrap_or:lla.
+pub async fn create_todo(
+    State(pool): State<SqlitePool>,
+    Json(input): Json<CreateTodo>,
+) -> Result<(StatusCode, Json<Todo>), StatusCode> {
+    let priority = input.priority.unwrap_or(1);
+    let status = input.status.unwrap_or_else(|| "pending".to_string());
+
+    let result = sqlx::query_as::<_, Todo>(
+        "INSERT INTO todos (title, description, due_date, priority, status)
+         VALUES (?, ?, ?, ?, ?)
+         RETURNING id, title, description, due_date, priority, status",
+    )
+    .bind(&input.title)
+    .bind(&input.description)
+    .bind(&input.due_date)
+    .bind(priority)
+    .bind(&status)
+    .fetch_one(&pool)
+    .await
+    .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
+
+    Ok((StatusCode::CREATED, Json(result)))
+}
+
+/// GET list — fetch_all.
+pub async fn list_todos(
+    State(pool): State<SqlitePool>,
+) -> Result<Json<Vec<Todo>>, StatusCode> {
+    let todos = sqlx::query_as::<_, Todo>("SELECT id, title, description, due_date, priority, status FROM todos")
+        .fetch_all(&pool)
+        .await
+        .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
+    Ok(Json(todos))
+}
+
+/// GET by id — fetch_optional, None → 404.
+pub async fn get_todo(
+    State(pool): State<SqlitePool>,
+    Path(id): Path<i64>,
+) -> Result<Json<Todo>, StatusCode> {
+    let todo = sqlx::query_as::<_, Todo>(
+        "SELECT id, title, description, due_date, priority, status FROM todos WHERE id = ?",
+    )
+    .bind(id)
+    .fetch_optional(&pool)
+    .await
+    .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
+    match todo {
+        Some(t) => Ok(Json(t)),
+        None => Err(StatusCode::NOT_FOUND),
+    }
+}
+
+/// PUT — hae olemassaoleva, merge kentät, UPDATE RETURNING.
+pub async fn update_todo(
+    State(pool): State<SqlitePool>,
+    Path(id): Path<i64>,
+    Json(input): Json<UpdateTodo>,
+) -> Result<Json<Todo>, StatusCode> {
+    let existing = sqlx::query_as::<_, Todo>(
+        "SELECT id, title, description, due_date, priority, status FROM todos WHERE id = ?",
+    )
+    .bind(id)
+    .fetch_optional(&pool)
+    .await
+    .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
+    let existing = existing.ok_or(StatusCode::NOT_FOUND)?;
+
+    let updated = sqlx::query_as::<_, Todo>(
+        "UPDATE todos SET title = ?, description = ?, due_date = ?, priority = ?, status = ?
+         WHERE id = ? RETURNING id, title, description, due_date, priority, status",
+    )
+    .bind(input.title.unwrap_or(existing.title))
+    .bind(input.description.or(existing.description))
+    .bind(input.due_date.or(existing.due_date))
+    .bind(input.priority.unwrap_or(existing.priority))
+    .bind(input.status.unwrap_or(existing.status))
+    .bind(id)
+    .fetch_one(&pool)
+    .await
+    .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
+    Ok(Json(updated))
+}
+
+/// DELETE — rows_affected == 0 → 404, muuten 204.
+pub async fn delete_todo(
+    State(pool): State<SqlitePool>,
+    Path(id): Path<i64>,
+) -> Result<StatusCode, StatusCode> {
+    let result = sqlx::query("DELETE FROM todos WHERE id = ?")
+        .bind(id)
+        .execute(&pool)
+        .await
+        .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
+    if result.rows_affected() == 0 { return Err(StatusCode::NOT_FOUND); }
+    Ok(StatusCode::NO_CONTENT)
+}
+```
+
+## src/lib.rs
+
+Kirjastomoduuli: reititin `app()` ja taulun alustus `init_db()` — julkinen API integraatiotesteille.
+
+```rust
+//! Kirjastomoduuli — julkinen API integraatiotesteille.
+
+pub mod handlers;
+pub mod models;
+
+use axum::routing::{delete, get, post, put};
+use axum::Router;
+use sqlx::SqlitePool;
+use tower_http::cors::CorsLayer;
+
+/// Luo reititin annetulla tietokantapoolilla.
+pub fn app(pool: SqlitePool) -> Router {
+    Router::new()
+        .route("/todos", post(handlers::create_todo))
+        .route("/todos", get(handlers::list_todos))
+        .route("/todos/{id}", get(handlers::get_todo))
+        .route("/todos/{id}", put(handlers::update_todo))
+        .route("/todos/{id}", delete(handlers::delete_todo))
+        .layer(CorsLayer::permissive())
+        .with_state(pool)
+}
+
+/// Alusta tietokantataulu.
+pub async fn init_db(pool: &SqlitePool) {
+    sqlx::query(
+        "CREATE TABLE IF NOT EXISTS todos (
+            id          INTEGER PRIMARY KEY AUTOINCREMENT,
+            title       TEXT NOT NULL,
+            description TEXT,
+            due_date    TEXT,
+            priority    INTEGER NOT NULL DEFAULT 1,
+            status      TEXT NOT NULL DEFAULT 'pending'
+        )",
+    )
+    .execute(pool)
+    .await
+    .expect("Taulun luonti epäonnistui");
+}
+```
+
+## src/main.rs
+
+Käynnistyspiste: SQLite-pooli, taulun alustus, Axum-palvelin portissa 3000.
+
+```rust
+//! Axum CRUD — yksi endpoint-setti per entiteetti, SQLite-tietokanta.
+
+use sqlx::sqlite::SqlitePoolOptions;
+use todo_rs::{app, init_db};
+
+#[tokio::main]
+async fn main() {
+    let pool = SqlitePoolOptions::new()
+        .max_connections(5)
+        .connect("sqlite:./app.db?mode=rwc")
+        .await
+        .expect("Tietokantayhteys epäonnistui");
+
+    init_db(&pool).await;
+
+    let listener = tokio::net::TcpListener::bind("127.0.0.1:3000")
+        .await
+        .expect("Portin kuuntelu epäonnistui");
+
+    println!("Palvelin käynnissä: http://127.0.0.1:3000");
+    axum::serve(listener, app(pool)).await.unwrap();
+}
+```
+
+## tests/api_test.rs
+
+Integraatiotestit: spawn_server (muistinvarainen SQLite, satunnaisportti), CRUD-testit uniikilla datalla.
+
+```rust
+//! Integraatiotestit — muistinvarainen SQLite, uniikki data per testi.
+
+use axum::http::StatusCode;
+use reqwest::Client;
+use sqlx::sqlite::SqlitePoolOptions;
+use todo_rs::{app, init_db};
+
+/// Käynnistä testipalvelin satunnaisessa portissa.
+async fn spawn_server() -> (Client, String) {
+    let pool = SqlitePoolOptions::new()
+        .max_connections(1)
+        .connect("sqlite::memory:")
+        .await
+        .expect("Testitietokanta epäonnistui");
+    init_db(&pool).await;
+    let listener = tokio::net::TcpListener::bind("127.0.0.1:0")
+        .await
+        .expect("Testiportin kuuntelu epäonnistui");
+    let base_url = format!("http://{}", listener.local_addr().unwrap());
+    let router = app(pool);
+    tokio::spawn(async move { axum::serve(listener, router).await.unwrap() });
+    (Client::new(), base_url)
+}
+
+#[tokio::test]
+async fn test_create_todo() {
+    let (client, url) = spawn_server().await;
+    let res = client.post(format!("{url}/todos"))
+        .json(&serde_json::json!({"title": "Osta maitoa", "priority": 2}))
+        .send().await.unwrap();
+    assert_eq!(res.status(), StatusCode::CREATED);
+    let body: serde_json::Value = res.json().await.unwrap();
+    assert_eq!(body["title"], "Osta maitoa");
+    assert!(body["id"].is_number());
+}
+
+#[tokio::test]
+async fn test_get_todo_by_id() {
+    let (client, url) = spawn_server().await;
+    let created: serde_json::Value = client.post(format!("{url}/todos"))
+        .json(&serde_json::json!({"title": "Haettava tehtävä"}))
+        .send().await.unwrap().json().await.unwrap();
+    let id = created["id"].as_i64().unwrap();
+    let res = client.get(format!("{url}/todos/{id}")).send().await.unwrap();
+    assert_eq!(res.status(), StatusCode::OK);
+    let body: serde_json::Value = res.json().await.unwrap();
+    assert_eq!(body["id"], id);
+}
+
+#[tokio::test]
+async fn test_get_todo_not_found() {
+    let (client, url) = spawn_server().await;
+    let res = client.get(format!("{url}/todos/99999")).send().await.unwrap();
+    assert_eq!(res.status(), StatusCode::NOT_FOUND);
+}
+
+#[tokio::test]
+async fn test_delete_todo() {
+    let (client, url) = spawn_server().await;
+    let created: serde_json::Value = client.post(format!("{url}/todos"))
+        .json(&serde_json::json!({"title": "Poistettava"}))
+        .send().await.unwrap().json().await.unwrap();
+    let id = created["id"].as_i64().unwrap();
+    let res = client.delete(format!("{url}/todos/{id}")).send().await.unwrap();
+    assert_eq!(res.status(), StatusCode::NO_CONTENT);
+    let res = client.get(format!("{url}/todos/{id}")).send().await.unwrap();
+    assert_eq!(res.status(), StatusCode::NOT_FOUND);
+}
+```
--- a/kipina-codebench/golden-examples/todo-rs/.gitignore
+++ b/kipina-codebench/golden-examples/todo-rs/.gitignore
@@ -0,0 +1 @@
+target/
--- a/kipina-codebench/golden-examples/todo-rs/Cargo.toml
+++ b/kipina-codebench/golden-examples/todo-rs/Cargo.toml
@@ -0,0 +1,16 @@
+[package]
+name = "todo-rs"
+version = "0.1.0"
+edition = "2024"
+
+[dependencies]
+axum = "0.8"
+tokio = { version = "1", features = ["full"] }
+serde = { version = "1", features = ["derive"] }
+serde_json = "1"
+sqlx = { version = "0.8", features = ["sqlite", "runtime-tokio"] }
+tower-http = { version = "0.6", features = ["cors"] }
+
+[dev-dependencies]
+reqwest = { version = "0.13", default-features = false, features = ["json", "rustls"] }
+tokio = { version = "1", features = ["full", "test-util"] }
--- a/kipina-codebench/golden-examples/todo-rs/src/handlers.rs
+++ b/kipina-codebench/golden-examples/todo-rs/src/handlers.rs
@@ -0,0 +1,122 @@
+//! Käsittelijät — CRUD-operaatiot todo-entiteetille.
+
+use axum::extract::{Path, State};
+use axum::http::StatusCode;
+use axum::Json;
+use sqlx::SqlitePool;
+
+use crate::models::{CreateTodo, Todo, UpdateTodo};
+
+/// Luo uusi tehtävä.
+pub async fn create_todo(
+    State(pool): State<SqlitePool>,
+    Json(input): Json<CreateTodo>,
+) -> Result<(StatusCode, Json<Todo>), StatusCode> {
+    let priority = input.priority.unwrap_or(1);
+    let status = input.status.unwrap_or_else(|| "pending".to_string());
+
+    let result = sqlx::query_as::<_, Todo>(
+        "INSERT INTO todos (title, description, due_date, priority, status)
+         VALUES (?, ?, ?, ?, ?)
+         RETURNING id, title, description, due_date, priority, status",
+    )
+    .bind(&input.title)
+    .bind(&input.description)
+    .bind(&input.due_date)
+    .bind(priority)
+    .bind(&status)
+    .fetch_one(&pool)
+    .await
+    .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
+
+    Ok((StatusCode::CREATED, Json(result)))
+}
+
+/// Listaa kaikki tehtävät.
+pub async fn list_todos(
+    State(pool): State<SqlitePool>,
+) -> Result<Json<Vec<Todo>>, StatusCode> {
+    let todos = sqlx::query_as::<_, Todo>("SELECT id, title, description, due_date, priority, status FROM todos")
+        .fetch_all(&pool)
+        .await
+        .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
+
+    Ok(Json(todos))
+}
+
+/// Hae tehtävä id:llä.
+pub async fn get_todo(
+    State(pool): State<SqlitePool>,
+    Path(id): Path<i64>,
+) -> Result<Json<Todo>, StatusCode> {
+    let todo = sqlx::query_as::<_, Todo>(
+        "SELECT id, title, description, due_date, priority, status FROM todos WHERE id = ?",
+    )
+    .bind(id)
+    .fetch_optional(&pool)
+    .await
+    .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
+
+    match todo {
+        Some(t) => Ok(Json(t)),
+        None => Err(StatusCode::NOT_FOUND),
+    }
+}
+
+/// Päivitä tehtävä id:llä.
+pub async fn update_todo(
+    State(pool): State<SqlitePool>,
+    Path(id): Path<i64>,
+    Json(input): Json<UpdateTodo>,
+) -> Result<Json<Todo>, StatusCode> {
+    let existing = sqlx::query_as::<_, Todo>(
+        "SELECT id, title, description, due_date, priority, status FROM todos WHERE id = ?",
+    )
+    .bind(id)
+    .fetch_optional(&pool)
+    .await
+    .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
+
+    let existing = existing.ok_or(StatusCode::NOT_FOUND)?;
+
+    let title = input.title.unwrap_or(existing.title);
+    let description = input.description.or(existing.description);
+    let due_date = input.due_date.or(existing.due_date);
+    let priority = input.priority.unwrap_or(existing.priority);
+    let status = input.status.unwrap_or(existing.status);
+
+    let updated = sqlx::query_as::<_, Todo>(
+        "UPDATE todos SET title = ?, description = ?, due_date = ?, priority = ?, status = ?
+         WHERE id = ?
+         RETURNING id, title, description, due_date, priority, status",
+    )
+    .bind(&title)
+    .bind(&description)
+    .bind(&due_date)
+    .bind(priority)
+    .bind(&status)
+    .bind(id)
+    .fetch_one(&pool)
+    .await
+    .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
+
+    Ok(Json(updated))
+}
+
+/// Poista tehtävä id:llä.
+pub async fn delete_todo(
+    State(pool): State<SqlitePool>,
+    Path(id): Path<i64>,
+) -> Result<StatusCode, StatusCode> {
+    let result = sqlx::query("DELETE FROM todos WHERE id = ?")
+        .bind(id)
+        .execute(&pool)
+        .await
+        .map_err(|_| StatusCode::INTERNAL_SERVER_ERROR)?;
+
+    if result.rows_affected() == 0 {
+        return Err(StatusCode::NOT_FOUND);
+    }
+
+    Ok(StatusCode::NO_CONTENT)
+}
--- a/kipina-codebench/golden-examples/todo-rs/src/lib.rs
+++ b/kipina-codebench/golden-examples/todo-rs/src/lib.rs
@@ -0,0 +1,38 @@
+//! Kirjastomoduuli — julkinen API integraatiotesteille.
+
+pub mod handlers;
+pub mod models;
+
+use axum::routing::{delete, get, post, put};
+use axum::Router;
+use sqlx::SqlitePool;
+use tower_http::cors::CorsLayer;
+
+/// Luo reititin annetulla tietokantapoolilla.
+pub fn app(pool: SqlitePool) -> Router {
+    Router::new()
+        .route("/todos", post(handlers::create_todo))
+        .route("/todos", get(handlers::list_todos))
+        .route("/todos/{id}", get(handlers::get_todo))
+        .route("/todos/{id}", put(handlers::update_todo))
+        .route("/todos/{id}", delete(handlers::delete_todo))
+        .layer(CorsLayer::permissive())
+        .with_state(pool)
+}
+
+/// Alusta tietokantataulu.
+pub async fn init_db(pool: &SqlitePool) {
+    sqlx::query(
+        "CREATE TABLE IF NOT EXISTS todos (
+            id          INTEGER PRIMARY KEY AUTOINCREMENT,
+            title       TEXT NOT NULL,
+            description TEXT,
+            due_date    TEXT,
+            priority    INTEGER NOT NULL DEFAULT 1,
+            status      TEXT NOT NULL DEFAULT 'pending'
+        )",
+    )
+    .execute(pool)
+    .await
+    .expect("Taulun luonti epäonnistui");
+}
--- a/kipina-codebench/golden-examples/todo-rs/src/main.rs
+++ b/kipina-codebench/golden-examples/todo-rs/src/main.rs
@@ -0,0 +1,22 @@
+//! Axum CRUD — yksi endpoint-setti per entiteetti, SQLite-tietokanta.
+
+use sqlx::sqlite::SqlitePoolOptions;
+use todo_rs::{app, init_db};
+
+#[tokio::main]
+async fn main() {
+    let pool = SqlitePoolOptions::new()
+        .max_connections(5)
+        .connect("sqlite:./app.db?mode=rwc")
+        .await
+        .expect("Tietokantayhteys epäonnistui");
+
+    init_db(&pool).await;
+
+    let listener = tokio::net::TcpListener::bind("127.0.0.1:3000")
+        .await
+        .expect("Portin kuuntelu epäonnistui");
+
+    println!("Palvelin käynnissä: http://127.0.0.1:3000");
+    axum::serve(listener, app(pool)).await.unwrap();
+}
--- a/kipina-codebench/golden-examples/todo-rs/src/models.rs
+++ b/kipina-codebench/golden-examples/todo-rs/src/models.rs
@@ -0,0 +1,34 @@
+//! Tietomallit — Todo, CreateTodo, UpdateTodo serde-rakenteina.
+
+use serde::{Deserialize, Serialize};
+
+/// Tehtävä — otsikko, kuvaus, deadline, prioriteetti ja status.
+#[derive(Debug, Serialize, Deserialize, sqlx::FromRow)]
+pub struct Todo {
+    pub id: i64,
+    pub title: String,
+    pub description: Option<String>,
+    pub due_date: Option<String>,
+    pub priority: i64,
+    pub status: String,
+}
+
+/// Uuden tehtävän luonti. Pakolliset: title.
+#[derive(Debug, Deserialize)]
+pub struct CreateTodo {
+    pub title: String,
+    pub description: Option<String>,
+    pub due_date: Option<String>,
+    pub priority: Option<i64>,
+    pub status: Option<String>,
+}
+
+/// Tehtävän päivitys — kaikki kentät valinnaisia.
+#[derive(Debug, Deserialize)]
+pub struct UpdateTodo {
+    pub title: Option<String>,
+    pub description: Option<String>,
+    pub due_date: Option<String>,
+    pub priority: Option<i64>,
+    pub status: Option<String>,
+}
--- a/kipina-codebench/golden-examples/todo-rs/tests/api_test.rs
+++ b/kipina-codebench/golden-examples/todo-rs/tests/api_test.rs
@@ -0,0 +1,262 @@
+//! Integraatiotestit — muistinvarainen SQLite, uniikki data per testi.
+
+use axum::http::StatusCode;
+use reqwest::Client;
+use sqlx::sqlite::SqlitePoolOptions;
+use todo_rs::{app, init_db};
+
+/// Käynnistä testipalvelin satunnaisessa portissa.
+async fn spawn_server() -> (Client, String) {
+    let pool = SqlitePoolOptions::new()
+        .max_connections(1)
+        .connect("sqlite::memory:")
+        .await
+        .expect("Testitietokanta epäonnistui");
+
+    init_db(&pool).await;
+
+    let listener = tokio::net::TcpListener::bind("127.0.0.1:0")
+        .await
+        .expect("Testiportin kuuntelu epäonnistui");
+    let addr = listener.local_addr().unwrap();
+    let base_url = format!("http://{addr}");
+
+    let router = app(pool);
+    tokio::spawn(async move {
+        axum::serve(listener, router).await.unwrap();
+    });
+
+    (Client::new(), base_url)
+}
+
+#[tokio::test]
+async fn test_create_todo() {
+    let (client, url) = spawn_server().await;
+
+    let res = client
+        .post(format!("{url}/todos"))
+        .json(&serde_json::json!({"title": "Osta maitoa", "priority": 2}))
+        .send()
+        .await
+        .unwrap();
+
+    assert_eq!(res.status(), StatusCode::CREATED);
+    let body: serde_json::Value = res.json().await.unwrap();
+    assert_eq!(body["title"], "Osta maitoa");
+    assert_eq!(body["priority"], 2);
+    assert!(body["id"].is_number());
+}
+
+#[tokio::test]
+async fn test_create_todo_defaults() {
+    let (client, url) = spawn_server().await;
+
+    let res = client
+        .post(format!("{url}/todos"))
+        .json(&serde_json::json!({"title": "Oletusarvotesti"}))
+        .send()
+        .await
+        .unwrap();
+
+    assert_eq!(res.status(), StatusCode::CREATED);
+    let body: serde_json::Value = res.json().await.unwrap();
+    assert_eq!(body["priority"], 1);
+    assert_eq!(body["status"], "pending");
+    assert!(body["description"].is_null());
+}
+
+#[tokio::test]
+async fn test_list_todos() {
+    let (client, url) = spawn_server().await;
+
+    client
+        .post(format!("{url}/todos"))
+        .json(&serde_json::json!({"title": "Listattava tehtävä"}))
+        .send()
+        .await
+        .unwrap();
+
+    let res = client.get(format!("{url}/todos")).send().await.unwrap();
+    assert_eq!(res.status(), StatusCode::OK);
+
+    let body: Vec<serde_json::Value> = res.json().await.unwrap();
+    assert!(body.len() >= 1);
+}
+
+#[tokio::test]
+async fn test_get_todo_by_id() {
+    let (client, url) = spawn_server().await;
+
+    let created: serde_json::Value = client
+        .post(format!("{url}/todos"))
+        .json(&serde_json::json!({"title": "Haettava tehtävä"}))
+        .send()
+        .await
+        .unwrap()
+        .json()
+        .await
+        .unwrap();
+
+    let id = created["id"].as_i64().unwrap();
+    let res = client
+        .get(format!("{url}/todos/{id}"))
+        .send()
+        .await
+        .unwrap();
+
+    assert_eq!(res.status(), StatusCode::OK);
+    let body: serde_json::Value = res.json().await.unwrap();
+    assert_eq!(body["id"], id);
+    assert_eq!(body["title"], "Haettava tehtävä");
+}
+
+#[tokio::test]
+async fn test_get_todo_not_found() {
+    let (client, url) = spawn_server().await;
+
+    let res = client
+        .get(format!("{url}/todos/99999"))
+        .send()
+        .await
+        .unwrap();
+
+    assert_eq!(res.status(), StatusCode::NOT_FOUND);
+}
+
+#[tokio::test]
+async fn test_update_todo() {
+    let (client, url) = spawn_server().await;
+
+    let created: serde_json::Value = client
+        .post(format!("{url}/todos"))
+        .json(&serde_json::json!({"title": "Vanha otsikko"}))
+        .send()
+        .await
+        .unwrap()
+        .json()
+        .await
+        .unwrap();
+
+    let id = created["id"].as_i64().unwrap();
+    let res = client
+        .put(format!("{url}/todos/{id}"))
+        .json(&serde_json::json!({"title": "Uusi otsikko"}))
+        .send()
+        .await
+        .unwrap();
+
+    assert_eq!(res.status(), StatusCode::OK);
+    let body: serde_json::Value = res.json().await.unwrap();
+    assert_eq!(body["title"], "Uusi otsikko");
+}
+
+#[tokio::test]
+async fn test_update_todo_not_found() {
+    let (client, url) = spawn_server().await;
+
+    let res = client
+        .put(format!("{url}/todos/99999"))
+        .json(&serde_json::json!({"title": "Ei löydy"}))
+        .send()
+        .await
+        .unwrap();
+
+    assert_eq!(res.status(), StatusCode::NOT_FOUND);
+}
+
+#[tokio::test]
+async fn test_delete_todo() {
+    let (client, url) = spawn_server().await;
+
+    let created: serde_json::Value = client
+        .post(format!("{url}/todos"))
+        .json(&serde_json::json!({"title": "Poistettava"}))
+        .send()
+        .await
+        .unwrap()
+        .json()
+        .await
+        .unwrap();
+
+    let id = created["id"].as_i64().unwrap();
+    let res = client
+        .delete(format!("{url}/todos/{id}"))
+        .send()
+        .await
+        .unwrap();
+
+    assert_eq!(res.status(), StatusCode::NO_CONTENT);
+
+    let res = client
+        .get(format!("{url}/todos/{id}"))
+        .send()
+        .await
+        .unwrap();
+
+    assert_eq!(res.status(), StatusCode::NOT_FOUND);
+}
+
+#[tokio::test]
+async fn test_delete_todo_not_found() {
+    let (client, url) = spawn_server().await;
+
+    let res = client
+        .delete(format!("{url}/todos/99999"))
+        .send()
+        .await
+        .unwrap();
+
+    assert_eq!(res.status(), StatusCode::NOT_FOUND);
+}
+
+#[tokio::test]
+async fn test_full_lifecycle() {
+    let (client, url) = spawn_server().await;
+
+    // Luo
+    let created: serde_json::Value = client
+        .post(format!("{url}/todos"))
+        .json(&serde_json::json!({
+            "title": "Elinkaaritesti",
+            "description": "Testataan koko CRUD-kierto",
+            "due_date": "2026-12-31",
+            "priority": 3,
+            "status": "in_progress"
+        }))
+        .send()
+        .await
+        .unwrap()
+        .json()
+        .await
+        .unwrap();
+
+    let id = created["id"].as_i64().unwrap();
+    assert_eq!(created["title"], "Elinkaaritesti");
+    assert_eq!(created["description"], "Testataan koko CRUD-kierto");
+    assert_eq!(created["due_date"], "2026-12-31");
+    assert_eq!(created["priority"], 3);
+    assert_eq!(created["status"], "in_progress");
+
+    // Päivitä
+    let updated: serde_json::Value = client
+        .put(format!("{url}/todos/{id}"))
+        .json(&serde_json::json!({"status": "done"}))
+        .send()
+        .await
+        .unwrap()
+        .json()
+        .await
+        .unwrap();
+
+    assert_eq!(updated["status"], "done");
+    assert_eq!(updated["title"], "Elinkaaritesti");
+
+    // Poista
+    let res = client
+        .delete(format!("{url}/todos/{id}"))
+        .send()
+        .await
+        .unwrap();
+
+    assert_eq!(res.status(), StatusCode::NO_CONTENT);
+}
--- a/kipina-codebench/golden-examples/todo.md
+++ b/kipina-codebench/golden-examples/todo.md
@@ -0,0 +1,230 @@
+# Todo — referenssitoteutus (FastAPI + SQLAlchemy 2.0 + SQLite)
+
+Tämä on täydellinen esimerkki. Generoi vastaava rakenne annetulle projektille.
+Käytä VAIN JSON-spekin kenttiä — älä lisää ylimääräisiä.
+
+## models.py
+
+SQLAlchemy 2.0: `DeclarativeBase` + `Mapped` + `mapped_column`. EI `Column()`, EI `declarative_base()`.
+
+```python
+"""Tietokantamallit — SQLAlchemy 2.0, Mapped-tyypitys, SQLite."""
+
+from datetime import date
+
+from sqlalchemy import String, Text, Date, create_engine
+from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, sessionmaker
+
+DATABASE_URL = "sqlite:///./app.db"
+engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
+SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
+
+
+class Base(DeclarativeBase):
+    pass
+
+
+class Todo(Base):
+    """Tehtävä — otsikko, kuvaus, deadline, prioriteetti ja status."""
+
+    __tablename__ = "todos"
+
+    id: Mapped[int] = mapped_column(primary_key=True, index=True)
+    title: Mapped[str] = mapped_column(String(255))
+    description: Mapped[str | None] = mapped_column(Text, default=None)
+    due_date: Mapped[date | None] = mapped_column(Date, default=None)
+    priority: Mapped[int] = mapped_column(default=1)
+    status: Mapped[str] = mapped_column(String(20), default="pending")
+
+
+Base.metadata.create_all(bind=engine)
+```
+
+Huomaa:
+- `str | None` (ei `Optional[str]`)
+- `String(20)` status-kentälle (ei Enum)
+- Vain spekin kentät — ei `created_at` tai muita ylimääräisiä
+
+## schemas.py
+
+Pydantic v2: `ConfigDict(from_attributes=True)`. EI `class Config: orm_mode = True`.
+
+```python
+"""Pydantic v2 -skeemat — Create sisääntulolle, Response vastaukselle."""
+
+from datetime import date
+
+from pydantic import BaseModel, ConfigDict
+
+
+class TodoCreate(BaseModel):
+    """Uuden tehtävän luonti. Pakolliset: title."""
+
+    title: str
+    description: str | None = None
+    due_date: date | None = None
+    priority: int = 1
+    status: str = "pending"
+
+
+class TodoResponse(TodoCreate):
+    """Palautettava tehtävä — sisältää id:n."""
+
+    id: int
+    model_config = ConfigDict(from_attributes=True)
+```
+
+## main.py
+
+FastAPI CRUD: POST 201, GET list, GET by id 404, PUT, DELETE 204. Käytä `model_dump()` (ei `.dict()`).
+
+```python
+"""FastAPI CRUD — yksi endpoint-setti per entiteetti."""
+
+from fastapi import FastAPI, Depends, HTTPException
+from sqlalchemy.orm import Session
+
+from models import SessionLocal, Todo
+from schemas import TodoCreate, TodoResponse
+
+app = FastAPI()
+
+
+def get_db():
+    """Tietokantasessio per pyyntö."""
+    db = SessionLocal()
+    try:
+        yield db
+    finally:
+        db.close()
+
+
+@app.post("/todos/", response_model=TodoResponse, status_code=201)
+def create_todo(item: TodoCreate, db: Session = Depends(get_db)):
+    db_item = Todo(**item.model_dump())
+    db.add(db_item)
+    db.commit()
+    db.refresh(db_item)
+    return db_item
+
+
+@app.get("/todos/", response_model=list[TodoResponse])
+def list_todos(db: Session = Depends(get_db)):
+    return db.query(Todo).all()
+
+
+@app.get("/todos/{item_id}", response_model=TodoResponse)
+def get_todo(item_id: int, db: Session = Depends(get_db)):
+    item = db.query(Todo).filter(Todo.id == item_id).first()
+    if not item:
+        raise HTTPException(status_code=404, detail="Todo not found")
+    return item
+
+
+@app.put("/todos/{item_id}", response_model=TodoResponse)
+def update_todo(item_id: int, item: TodoCreate, db: Session = Depends(get_db)):
+    db_item = db.query(Todo).filter(Todo.id == item_id).first()
+    if not db_item:
+        raise HTTPException(status_code=404, detail="Todo not found")
+    for key, value in item.model_dump().items():
+        setattr(db_item, key, value)
+    db.commit()
+    db.refresh(db_item)
+    return db_item
+
+
+@app.delete("/todos/{item_id}", status_code=204)
+def delete_todo(item_id: int, db: Session = Depends(get_db)):
+    db_item = db.query(Todo).filter(Todo.id == item_id).first()
+    if not db_item:
+        raise HTTPException(status_code=404, detail="Todo not found")
+    db.delete(db_item)
+    db.commit()
+```
+
+## test_main.py
+
+Testit: erillinen test.db, `override_get_db`, `TestClient`. Uniikki suomenkielinen data per testi.
+PUT-testi lähettää KAIKKI pakolliset kentät.
+
+Generoi TARKALLEEN nämä 6 testiä per entiteetti — ei enempää, ei vähempää:
+1. `test_create_{entity}` — POST, assert 201 + id
+2. `test_list_{entities}` — POST ensin, GET lista, assert len >= 1
+3. `test_get_{entity}_by_id` — POST, GET by id, assert id täsmää
+4. `test_get_{entity}_not_found` — GET /99999, assert 404
+5. `test_update_{entity}` — POST, PUT kaikilla pakollisilla kentillä, assert 200
+6. `test_delete_{entity}` — POST, DELETE assert 204, GET uudestaan assert 404
+
+Ei search-, filter- tai muita ylimääräisiä testejä.
+
+```python
+"""Pytest — TestClient, erillinen test.db, uniikki data per testi."""
+
+from fastapi.testclient import TestClient
+from sqlalchemy import create_engine
+from sqlalchemy.orm import sessionmaker
+
+from main import app, get_db
+from models import Base
+
+test_engine = create_engine(
+    "sqlite:///./test.db", connect_args={"check_same_thread": False}
+)
+TestSession = sessionmaker(autocommit=False, autoflush=False, bind=test_engine)
+Base.metadata.create_all(bind=test_engine)
+
+
+def override_get_db():
+    db = TestSession()
+    try:
+        yield db
+    finally:
+        db.close()
+
+
+app.dependency_overrides[get_db] = override_get_db
+client = TestClient(app)
+
+
+def test_create_todo():
+    response = client.post("/todos/", json={"title": "Osta maitoa", "priority": 2})
+    assert response.status_code == 201
+    assert response.json()["title"] == "Osta maitoa"
+    assert "id" in response.json()
+
+
+def test_list_todos():
+    client.post("/todos/", json={"title": "Listattava tehtävä"})
+    response = client.get("/todos/")
+    assert response.status_code == 200
+    assert len(response.json()) >= 1
+
+
+def test_get_todo_by_id():
+    created = client.post("/todos/", json={"title": "Haettava tehtävä"}).json()
+    response = client.get(f"/todos/{created['id']}")
+    assert response.status_code == 200
+    assert response.json()["id"] == created["id"]
+
+
+def test_get_todo_not_found():
+    response = client.get("/todos/99999")
+    assert response.status_code == 404
+
+
+def test_update_todo():
+    created = client.post("/todos/", json={"title": "Vanha otsikko"}).json()
+    response = client.put(
+        f"/todos/{created['id']}", json={"title": "Uusi otsikko"}
+    )
+    assert response.status_code == 200
+    assert response.json()["title"] == "Uusi otsikko"
+
+
+def test_delete_todo():
+    created = client.post("/todos/", json={"title": "Poistettava"}).json()
+    response = client.delete(f"/todos/{created['id']}")
+    assert response.status_code == 204
+    response = client.get(f"/todos/{created['id']}")
+    assert response.status_code == 404
+```
--- a/kipina-codebench/golden-examples/todo/main.py
+++ b/kipina-codebench/golden-examples/todo/main.py
@@ -0,0 +1,61 @@
+"""FastAPI CRUD — yksi endpoint-setti per entiteetti."""
+
+from fastapi import FastAPI, Depends, HTTPException
+from sqlalchemy.orm import Session
+
+from models import SessionLocal, Todo
+from schemas import TodoCreate, TodoResponse
+
+app = FastAPI()
+
+
+def get_db():
+    """Tietokantasessio per pyyntö."""
+    db = SessionLocal()
+    try:
+        yield db
+    finally:
+        db.close()
+
+
+@app.post("/todos/", response_model=TodoResponse, status_code=201)
+def create_todo(item: TodoCreate, db: Session = Depends(get_db)):
+    db_item = Todo(**item.model_dump())
+    db.add(db_item)
+    db.commit()
+    db.refresh(db_item)
+    return db_item
+
+
+@app.get("/todos/", response_model=list[TodoResponse])
+def list_todos(db: Session = Depends(get_db)):
+    return db.query(Todo).all()
+
+
+@app.get("/todos/{item_id}", response_model=TodoResponse)
+def get_todo(item_id: int, db: Session = Depends(get_db)):
+    item = db.query(Todo).filter(Todo.id == item_id).first()
+    if not item:
+        raise HTTPException(status_code=404, detail="Todo not found")
+    return item
+
+
+@app.put("/todos/{item_id}", response_model=TodoResponse)
+def update_todo(item_id: int, item: TodoCreate, db: Session = Depends(get_db)):
+    db_item = db.query(Todo).filter(Todo.id == item_id).first()
+    if not db_item:
+        raise HTTPException(status_code=404, detail="Todo not found")
+    for key, value in item.model_dump().items():
+        setattr(db_item, key, value)
+    db.commit()
+    db.refresh(db_item)
+    return db_item
+
+
+@app.delete("/todos/{item_id}", status_code=204)
+def delete_todo(item_id: int, db: Session = Depends(get_db)):
+    db_item = db.query(Todo).filter(Todo.id == item_id).first()
+    if not db_item:
+        raise HTTPException(status_code=404, detail="Todo not found")
+    db.delete(db_item)
+    db.commit()
--- a/kipina-codebench/golden-examples/todo/models.py
+++ b/kipina-codebench/golden-examples/todo/models.py
@@ -0,0 +1,30 @@
+"""Tietokantamallit — SQLAlchemy 2.0, Mapped-tyypitys, SQLite."""
+
+from datetime import date
+
+from sqlalchemy import String, Text, Date, create_engine
+from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, sessionmaker
+
+DATABASE_URL = "sqlite:///./app.db"
+engine = create_engine(DATABASE_URL, connect_args={"check_same_thread": False})
+SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
+
+
+class Base(DeclarativeBase):
+    pass
+
+
+class Todo(Base):
+    """Tehtävä — otsikko, kuvaus, deadline, prioriteetti ja status."""
+
+    __tablename__ = "todos"
+
+    id: Mapped[int] = mapped_column(primary_key=True, index=True)
+    title: Mapped[str] = mapped_column(String(255))
+    description: Mapped[str | None] = mapped_column(Text, default=None)
+    due_date: Mapped[date | None] = mapped_column(Date, default=None)
+    priority: Mapped[int] = mapped_column(default=1)
+    status: Mapped[str] = mapped_column(String(20), default="pending")
+
+
+Base.metadata.create_all(bind=engine)
--- a/kipina-codebench/golden-examples/todo/pyproject.toml
+++ b/kipina-codebench/golden-examples/todo/pyproject.toml
@@ -0,0 +1,11 @@
+[project]
+name = "todo-app"
+version = "0.1.0"
+requires-python = ">=3.14"
+dependencies = [
+    "fastapi",
+    "uvicorn[standard]",
+    "sqlalchemy",
+    "pytest",
+    "httpx",
+]
--- a/kipina-codebench/golden-examples/todo/schemas.py
+++ b/kipina-codebench/golden-examples/todo/schemas.py
@@ -0,0 +1,22 @@
+"""Pydantic v2 -skeemat — Create sisääntulolle, Response vastaukselle."""
+
+from datetime import date
+
+from pydantic import BaseModel, ConfigDict
+
+
+class TodoCreate(BaseModel):
+    """Uuden tehtävän luonti. Pakolliset: title."""
+
+    title: str
+    description: str | None = None
+    due_date: date | None = None
+    priority: int = 1
+    status: str = "pending"
+
+
+class TodoResponse(TodoCreate):
+    """Palautettava tehtävä — sisältää id:n."""
+
+    id: int
+    model_config = ConfigDict(from_attributes=True)
--- a/kipina-codebench/golden-examples/todo/test_main.py
+++ b/kipina-codebench/golden-examples/todo/test_main.py
@@ -0,0 +1,69 @@
+"""Pytest — TestClient, erillinen test.db, uniikki data per testi."""
+
+from fastapi.testclient import TestClient
+from sqlalchemy import create_engine
+from sqlalchemy.orm import sessionmaker
+
+from main import app, get_db
+from models import Base
+
+test_engine = create_engine(
+    "sqlite:///./test.db", connect_args={"check_same_thread": False}
+)
+TestSession = sessionmaker(autocommit=False, autoflush=False, bind=test_engine)
+Base.metadata.create_all(bind=test_engine)
+
+
+def override_get_db():
+    db = TestSession()
+    try:
+        yield db
+    finally:
+        db.close()
+
+
+app.dependency_overrides[get_db] = override_get_db
+client = TestClient(app)
+
+
+def test_create_todo():
+    response = client.post("/todos/", json={"title": "Osta maitoa", "priority": 2})
+    assert response.status_code == 201
+    assert response.json()["title"] == "Osta maitoa"
+    assert "id" in response.json()
+
+
+def test_list_todos():
+    client.post("/todos/", json={"title": "Listattava tehtävä"})
+    response = client.get("/todos/")
+    assert response.status_code == 200
+    assert len(response.json()) >= 1
+
+
+def test_get_todo_by_id():
+    created = client.post("/todos/", json={"title": "Haettava tehtävä"}).json()
+    response = client.get(f"/todos/{created['id']}")
+    assert response.status_code == 200
+    assert response.json()["id"] == created["id"]
+
+
+def test_get_todo_not_found():
+    response = client.get("/todos/99999")
+    assert response.status_code == 404
+
+
+def test_update_todo():
+    created = client.post("/todos/", json={"title": "Vanha otsikko"}).json()
+    response = client.put(
+        f"/todos/{created['id']}", json={"title": "Uusi otsikko"}
+    )
+    assert response.status_code == 200
+    assert response.json()["title"] == "Uusi otsikko"
+
+
+def test_delete_todo():
+    created = client.post("/todos/", json={"title": "Poistettava"}).json()
+    response = client.delete(f"/todos/{created['id']}")
+    assert response.status_code == 204
+    response = client.get(f"/todos/{created['id']}")
+    assert response.status_code == 404
--- a/kipina-codebench/package.json
+++ b/kipina-codebench/package.json
@@ -0,0 +1,13 @@
+{
+  "name": "kipina-codebench",
+  "version": "0.1.0",
+  "description": "LLM-koodingenerointibenchmark — testaa Ollama-mallien kykyä generoida toimivia FastAPI-projekteja",
+  "type": "module",
+  "bin": {
+    "codebench": "./benchmark.mjs"
+  },
+  "scripts": {
+    "bench": "node benchmark.mjs --scenarios all",
+    "docker:build": "docker build -t kipina-pytest -f Dockerfile.pytest ."
+  }
+}
--- a/kipina-codebench/profiles.json
+++ b/kipina-codebench/profiles.json
@@ -0,0 +1,65 @@
+{
+  "models": {
+    "qwen3-coder:30b": {
+      "profile": "large",
+      "role": "primary",
+      "prompt": "code",
+      "golden": "todo.md",
+      "vram": "24GB",
+      "notes": "Pääkooderi. 97p, 188 tok/s. Noudattaa pitkiä sääntölistoja."
+    },
+    "qwen3:8b": {
+      "profile": "small",
+      "role": "primary",
+      "prompt": "code-small",
+      "golden": "todo-readme.md",
+      "vram": "8GB",
+      "notes": "Kevyt pääkooderi. Todo/users 100p, blog heikko. README-muoto golden examplelle."
+    },
+    "codestral:22b": {
+      "profile": "large",
+      "role": "backup",
+      "prompt": "code",
+      "golden": "todo.md",
+      "vram": "16GB",
+      "notes": "Mistral-varamalli. 88p, 44 tok/s."
+    },
+    "qwen3:4b": {
+      "profile": "small",
+      "role": "minimal",
+      "prompt": "code-small",
+      "golden": "todo.md",
+      "vram": "4GB",
+      "notes": "Minimaali. Vain todo toimii."
+    },
+    "qwen2.5-coder:32b": {
+      "profile": "large",
+      "role": "candidate",
+      "prompt": "code",
+      "golden": "todo.md",
+      "vram": "24GB",
+      "notes": "Edellinen sukupolvi. Vahva Rust-osaaminen."
+    },
+    "qwen3:14b": {
+      "profile": "large",
+      "role": "retired",
+      "prompt": "code",
+      "golden": "todo.md",
+      "vram": "16GB",
+      "notes": "Poistettu. Ei lisäarvoa 30b:hen verrattuna, blog epävakaa."
+    }
+  },
+  "profiles": {
+    "large": {
+      "prompt": "code",
+      "golden": "todo.md",
+      "description": "Täysi prompti + säännöt. Malleille >=14B."
+    },
+    "small": {
+      "prompt": "code-small",
+      "golden": "todo.md",
+      "description": "Tiivistetty prompti. Malleille <=8B."
+    }
+  },
+  "default_profile": "large"
+}
--- a/kipina-codebench/prompts/client.md
+++ b/kipina-codebench/prompts/client.md
@@ -0,0 +1,15 @@
+You are a product owner who turns vague ideas into clear, actionable software requirements.
+
+GIVEN a short project description from the user, produce a structured brief:
+
+1. PROJECT NAME: a short, descriptive name
+2. GOAL: one sentence explaining what the software does and who it's for
+3. CORE FEATURES: numbered list of 3-8 concrete features (not vague wishes)
+4. DATA MODEL: list the main entities and their key fields (include field types)
+5. API ENDPOINTS: list the REST endpoints (method + path + purpose)
+6. CONSTRAINTS: any technical constraints (e.g. "must use SQLite", "no auth needed")
+
+RULES:
+- Be specific: "User can filter todos by status" not "todo management"
+- Use plain English, no code
+- Maximum 400 words total
--- a/kipina-codebench/prompts/code-go.md
+++ b/kipina-codebench/prompts/code-go.md
@@ -0,0 +1,69 @@
+You are a Go backend developer. Generate a Chi web project with SQLite.
+
+Given the project requirements, JSON specification, and a REFERENCE IMPLEMENTATION, generate these files:
+
+1. go.mod — module declaration, go-chi/chi/v5, modernc.org/sqlite
+2. models.go — Structs with json tags
+3. handlers.go — Handler closures for each CRUD endpoint
+4. main.go — Entry point with InitDB(), NewRouter(), main()
+5. handlers_test.go — Integration tests using httptest against in-memory SQLite
+
+Do NOT generate any other files. Do NOT generate go.sum.
+
+OUTPUT FORMAT — use these exact markers to separate files:
+
+=== go.mod ===
+<module content>
+
+=== models.go ===
+<go code>
+
+=== handlers.go ===
+<go code>
+
+=== main.go ===
+<go code>
+
+=== handlers_test.go ===
+<go code>
+
+DOCUMENTATION — structs get // one-line comments. Keep it brief.
+
+RULES:
+- Follow the REFERENCE IMPLEMENTATION patterns exactly
+- Chi router with chi.URLParam(r, "param") for path parameters
+- database/sql + modernc.org/sqlite (pure Go driver, no CGO required)
+- Import the driver as blank import: _ "modernc.org/sqlite"
+- Handlers are closures: func handler(db *sql.DB) http.HandlerFunc
+- INSERT/UPDATE queries use RETURNING clause to get the row back via QueryRow + Scan
+- POST returns 201 (http.StatusCreated), DELETE returns 204 (http.StatusNoContent), GET missing returns 404
+- Use sql.ErrNoRows for not-found checks: if err == sql.ErrNoRows { ... }
+- No compile-time query macros — use db.QueryRow(), db.Query(), db.Exec() directly
+- Empty slice not nil for list endpoints: if items == nil { items = []Item{} }
+- Optional fields use pointer types (*string, *int64) with json tag omitempty
+- Set Content-Type header: w.Header().Set("Content-Type", "application/json")
+- Parse path ID with strconv.ParseInt(chi.URLParam(r, "id"), 10, 64)
+- InitDB uses log.Fatal on error, NewRouter returns http.Handler
+- main() opens "file:app.db?mode=rwc" and listens on 127.0.0.1:3000
+- No markdown fences inside file content — just raw code
+- You MUST generate ALL 5 files. Do not stop early.
+
+TESTS — follow this exact setupTestServer pattern:
+
+func setupTestServer(t *testing.T) (*httptest.Server, *sql.DB) {
+    t.Helper()
+    db, err := sql.Open("sqlite", ":memory:")
+    if err != nil {
+        t.Fatal(err)
+    }
+    InitDB(db)
+    return httptest.NewServer(NewRouter(db)), db
+}
+
+- Each test function calls setupTestServer(t) to get (ts, db)
+- defer ts.Close() and defer db.Close() in every test
+- Use standard library: http.Post, http.Get, http.NewRequest for PUT/DELETE
+- Use strings.NewReader for JSON request bodies
+- Decode responses with json.NewDecoder(resp.Body).Decode(&body)
+- Unique descriptive data, NOT generic "test" strings
+- Format IDs with fmt.Sprintf("%.0f", id) when building URLs from float64
--- a/kipina-codebench/prompts/code-rs.md
+++ b/kipina-codebench/prompts/code-rs.md
@@ -0,0 +1,73 @@
+You are a Rust backend developer. Generate an Axum web project with SQLx and SQLite.
+
+Given the project requirements, JSON specification, and a REFERENCE IMPLEMENTATION, generate these files:
+
+1. Cargo.toml — axum 0.8, tokio, serde/serde_json, sqlx (sqlite, runtime-tokio), tower-http, reqwest 0.13 with features ["json", "rustls"] (for tests)
+2. src/models.rs — Structs with Serialize, Deserialize, FromRow derives
+3. src/handlers.rs — Async handler functions for each CRUD endpoint
+4. src/lib.rs — Public app(pool) function returning Router, init_db() for table creation
+5. src/main.rs — Binary entry point, connect to SQLite, bind to port
+6. tests/api_test.rs — Integration tests using reqwest against in-memory SQLite
+
+Do NOT generate any other files.
+
+OUTPUT FORMAT — use these exact markers to separate files:
+
+=== Cargo.toml ===
+<toml content>
+
+=== src/models.rs ===
+<rust code>
+
+=== src/handlers.rs ===
+<rust code>
+
+=== src/lib.rs ===
+<rust code>
+
+=== src/main.rs ===
+<rust code>
+
+=== tests/api_test.rs ===
+<rust code>
+
+DOCUMENTATION — every file starts with //! one-line module doc. Structs get /// one-line doc. Zensical: say what it IS, not what it does.
+
+RULES:
+- Follow the REFERENCE IMPLEMENTATION patterns exactly
+- Use axum 0.8 API: Router, Json, Path, State, StatusCode
+- ROUTING: use {param} NOT :param — e.g. .route("/items/{id}", get(get_item))
+- ROUTING: one .route() call per path, chain methods: .route("/items", post(create).get(list))
+- State is SqlitePool wrapped in axum::extract::State
+- app() takes SqlitePool as argument and calls .with_state(pool) on the Router
+- Handlers return Result<(StatusCode, Json<T>), StatusCode> or Result<StatusCode, StatusCode>
+- POST returns 201 (StatusCode::CREATED), DELETE returns 204 (StatusCode::NO_CONTENT), GET missing returns 404
+- CRITICAL: Use sqlx::query_as::<_, T>("SQL") runtime functions with .bind() — NEVER use sqlx::query_as!() or sqlx::query!() compile-time macros (they require DATABASE_URL at compile time)
+- Use sqlx::query("SQL") for writes (DELETE, etc.), sqlx::query_as::<_, T>("SQL") for reads
+- Use RETURNING clause in INSERT/UPDATE queries to get the created/updated row back
+- DateTime fields: store as TEXT, use String type in Rust structs
+- init_db: use .expect("msg") not Result return — keep it simple
+- NO markdown fences inside file content — just raw code
+- Edition 2024 in Cargo.toml
+- You MUST generate ALL 6 files. Do not stop early.
+
+TESTS — follow this exact spawn_server pattern:
+
+async fn spawn_server() -> (reqwest::Client, String) {
+    let pool = sqlx::sqlite::SqlitePoolOptions::new()
+        .max_connections(1)
+        .connect("sqlite::memory:")
+        .await
+        .expect("DB failed");
+    init_db(&pool).await;
+    let listener = tokio::net::TcpListener::bind("127.0.0.1:0").await.expect("Bind failed");
+    let addr = listener.local_addr().unwrap();
+    let base_url = format!("http://{addr}");
+    let router = app(pool);
+    tokio::spawn(async move { axum::serve(listener, router).await.unwrap() });
+    (reqwest::Client::new(), base_url)
+}
+
+- Each #[tokio::test] calls spawn_server() to get (client, url)
+- Unique descriptive data, NOT generic "test" strings
+- Use serde_json::json!() for request bodies
--- a/kipina-codebench/prompts/code-small.md
+++ b/kipina-codebench/prompts/code-small.md
@@ -0,0 +1,58 @@
+Generate a FastAPI project with SQLAlchemy and SQLite. Follow the REFERENCE IMPLEMENTATION exactly.
+
+Generate these 4 files with === markers:
+
+=== models.py ===
+=== schemas.py ===
+=== main.py ===
+=== test_main.py ===
+
+Key patterns (copy from reference):
+- class Base(DeclarativeBase): pass
+- Mapped[str] = mapped_column(String(255))
+- Mapped[str | None] = mapped_column(Text, default=None)
+- model_config = ConfigDict(from_attributes=True)
+- model_dump() not dict()
+- POST 201, GET list, GET by id 404, PUT, DELETE 204
+
+FOREIGN KEYS (when spec has relationships):
+- Child entity gets parent_id field: Mapped[int] = mapped_column(ForeignKey("parents.id"))
+- Import: from sqlalchemy import ForeignKey  (NOT from sqlalchemy.orm!)
+- Create schema includes parent_id: int
+- Test creates parent FIRST, then child with parent's id
+
+Example FK pattern in models.py:
+```
+class Author(Base):
+    __tablename__ = "authors"
+    id: Mapped[int] = mapped_column(primary_key=True, index=True)
+    name: Mapped[str] = mapped_column(String(255))
+
+class Post(Base):
+    __tablename__ = "posts"
+    id: Mapped[int] = mapped_column(primary_key=True, index=True)
+    title: Mapped[str] = mapped_column(String(255))
+    author_id: Mapped[int] = mapped_column(ForeignKey("authors.id"))
+```
+
+Example FK test patterns:
+```
+def test_create_post():
+    author = client.post("/authors/", json={"name": "Jane Austen"}).json()
+    response = client.post("/posts/", json={"title": "First Post", "author_id": author["id"]})
+    assert response.status_code == 201
+
+def test_update_post():
+    author = client.post("/authors/", json={"name": "Mark Twain"}).json()
+    created = client.post("/posts/", json={"title": "Old Title", "author_id": author["id"]}).json()
+    response = client.put(f"/posts/{created['id']}", json={"title": "New Title", "author_id": author["id"]})
+    assert response.status_code == 201
+```
+
+CRITICAL:
+- Use ONLY fields from the JSON spec — no created_at or extra fields
+- Generate EXACTLY 6 tests per entity: create, list, get_by_id, not_found, update, delete
+- No search, filter, or other extra tests
+- test_list: assert len(response.json()) >= 1, NEVER assert == 1 (database is shared between tests)
+- test_create for child entities: create parent FIRST, use parent's id
+- No markdown fences in output
--- a/kipina-codebench/prompts/code.md
+++ b/kipina-codebench/prompts/code.md
@@ -0,0 +1,47 @@
+You are a Python backend developer. Generate a FastAPI project with SQLAlchemy and SQLite.
+
+Given the project requirements, JSON specification, and a REFERENCE IMPLEMENTATION, generate these 4 files:
+
+1. models.py — SQLAlchemy 2.0: DeclarativeBase, Mapped, mapped_column (NOT legacy declarative_base)
+2. schemas.py — Pydantic v2: ConfigDict(from_attributes=True) (NOT class Config)
+3. main.py — FastAPI CRUD endpoints for each entity
+4. test_main.py — Pytest with TestClient, separate test.db, unique test data per test
+
+Do NOT generate pyproject.toml — it is created separately with uv.
+
+OUTPUT FORMAT — use these exact markers to separate files:
+
+=== models.py ===
+<python code>
+
+=== schemas.py ===
+<python code>
+
+=== main.py ===
+<python code>
+
+=== test_main.py ===
+<python code>
+
+DOCUMENTATION — every file must have a one-line module docstring. Classes get a one-line docstring. Keep it zensical: say what it IS, not what it does. No filler.
+
+NEVER USE DEPRECATED PATTERNS:
+- ✗ declarative_base() → ✓ class Base(DeclarativeBase): pass
+- ✗ Column(Type) → ✓ Mapped[type] = mapped_column(Type)
+- ✗ class Config: orm_mode = True → ✓ model_config = ConfigDict(from_attributes=True)
+- ✗ .dict() → ✓ .model_dump()
+- ✗ Optional[str] → ✓ str | None
+- ✗ session.query(Model).all() → ✓ session.execute(select(Model)).scalars().all()
+
+RULES:
+- Follow the REFERENCE IMPLEMENTATION patterns exactly
+- SQLAlchemy 2.0: DeclarativeBase + Mapped + mapped_column (not Column())
+- Python type unions: str | None (not Optional[str])
+- Tests: unique descriptive data per test, NOT generic "test_title" strings
+- Tests: PUT/update test data MUST include ALL required (non-nullable) fields, not just the field being updated
+- Do NOT add filter/search endpoints — only standard CRUD (create, list, get, update, delete)
+- CRITICAL: Use ONLY the fields listed in the JSON spec. NEVER add created_at, updated_at, or any field not in the spec
+- If the spec happens to include timestamp fields: use server_default=func.now() (from sqlalchemy import func) and make them Optional in Create schema
+- Absolute imports only (from models import ..., from schemas import ...)
+- NO markdown fences inside file content — just raw code
+- Only test endpoints that exist in main.py — no extra tests
--- a/kipina-codebench/prompts/convert-go.md
+++ b/kipina-codebench/prompts/convert-go.md
@@ -0,0 +1,25 @@
+Convert the following Python FastAPI project to Go using Chi router and modernc.org/sqlite.
+
+OUTPUT: Return ALL files with === markers:
+=== go.mod ===
+=== models.go ===
+=== handlers.go ===
+=== main.go ===
+=== handlers_test.go ===
+
+CONVERSION RULES:
+- package main for all files
+- Pydantic models → Go structs with json tags
+- SQLAlchemy ORM → database/sql with raw SQL and RETURNING clause
+- FastAPI routes → Chi router: r.Post("/path", handler(db))
+- Handlers are closures: func handler(db *sql.DB) http.HandlerFunc
+- Depends(get_db) → State passed via closure over *sql.DB
+- HTTPException(404) → http.Error(w, "not found", http.StatusNotFound)
+- POST returns http.StatusCreated (201), DELETE returns http.StatusNoContent (204)
+- sql.ErrNoRows for not-found checks
+- TestClient → httptest.NewServer + setupTestServer helper
+- test.db → sql.Open("sqlite", ":memory:")
+- Empty list: return []Entity{} not nil
+- import _ "modernc.org/sqlite" (pure Go driver, no CGO)
+- import "github.com/go-chi/chi/v5"
+- No markdown fences in output — just raw code
--- a/kipina-codebench/prompts/deprecated-patterns.md
+++ b/kipina-codebench/prompts/deprecated-patterns.md
@@ -0,0 +1,31 @@
+DEPRECATED PATTERNS — do NOT generate these. Use the modern alternative.
+
+SQLAlchemy:
+  ✗ from sqlalchemy.ext.declarative import declarative_base → ✓ from sqlalchemy.orm import DeclarativeBase
+  ✗ Base = declarative_base() → ✓ class Base(DeclarativeBase): pass
+  ✗ Column(Integer, primary_key=True) → ✓ Mapped[int] = mapped_column(primary_key=True)
+  ✗ Column(String(255)) → ✓ Mapped[str] = mapped_column(String(255))
+  ✗ session.query(User).filter_by(name="x").all() → ✓ session.execute(select(User).filter_by(name="x")).scalars().all()
+  ✗ session.query(User).get(5) → ✓ session.get(User, 5)
+  ✗ MetaData(bind=engine) → ✓ metadata.create_all(engine)
+
+Pydantic:
+  ✗ class Config: orm_mode = True → ✓ model_config = ConfigDict(from_attributes=True)
+  ✗ .dict() → ✓ .model_dump()
+  ✗ .json() → ✓ .model_dump_json()
+  ✗ parse_obj() → ✓ model_validate()
+  ✗ @validator → ✓ @field_validator
+  ✗ @root_validator → ✓ @model_validator
+  ✗ Optional[str] (auto-None in v1) → ✓ str | None = None (explicit default in v2)
+  ✗ ConstrainedInt → ✓ Annotated[int, Field(ge=0)]
+
+FastAPI:
+  ✗ status_code=201 → ✓ status_code=status.HTTP_201_CREATED (readable)
+  ✗ Manual exception strings → ✓ HTTPException(status_code=404, detail="Not found")
+  ✗ .dict() in handlers → ✓ .model_dump() (Pydantic v2)
+
+Python:
+  ✗ Optional[str] → ✓ str | None (PEP 604, Python 3.10+)
+  ✗ List[str] → ✓ list[str] (PEP 585, Python 3.9+)
+  ✗ Dict[str, int] → ✓ dict[str, int]
+  ✗ Tuple[int, ...] → ✓ tuple[int, ...]
--- a/kipina-codebench/prompts/fix.md
+++ b/kipina-codebench/prompts/fix.md
@@ -0,0 +1 @@
+You are a Python code fixer. Return ONLY the corrected Python file. No markdown fences, no explanations — just valid Python code.
--- a/kipina-codebench/prompts/golden-compact-py.md
+++ b/kipina-codebench/prompts/golden-compact-py.md
@@ -0,0 +1,36 @@
+REFERENCE PATTERNS (follow exactly):
+
+STACK: SQLAlchemy 2.0 + Pydantic v2 + FastAPI + SQLite
+
+models.py:
+  from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
+  class Base(DeclarativeBase): pass
+  Fields: Mapped[type] = mapped_column(SqlType, default=...)
+  Nullable: Mapped[str | None] = mapped_column(Text, default=None)
+  Status: Mapped[str] = mapped_column(String(20), default="pending")
+  FK: Mapped[int] = mapped_column(ForeignKey("table.id"))
+  End: Base.metadata.create_all(bind=engine)
+
+schemas.py:
+  class EntityCreate(BaseModel): fields with defaults
+  class EntityResponse(EntityCreate):
+      id: int
+      model_config = ConfigDict(from_attributes=True)
+
+main.py:
+  def get_db(): yield SessionLocal(); finally close
+  POST /{table}/ → 201, model_dump()
+  GET /{table}/ → list
+  GET /{table}/{id} → 404 if not found
+  PUT /{table}/{id} → model_dump(), setattr loop
+  DELETE /{table}/{id} → 204
+
+test_main.py:
+  test.db + override_get_db + TestClient
+  Unique descriptive data per test ("Buy milk", "Fetchable task"...)
+  test_create → 201 + assert "id" in json
+  test_list → post first, get, assert len >= 1
+  test_get_by_id → post, get by id, assert id matches
+  test_not_found → get /99999 → 404
+  test_update → post, put with ALL required fields, assert 200
+  test_delete → post, delete 204, get again → 404
--- a/kipina-codebench/prompts/golden-compact-rs.md
+++ b/kipina-codebench/prompts/golden-compact-rs.md
@@ -0,0 +1,43 @@
+REFERENCE PATTERNS (follow exactly):
+
+STACK: Axum 0.8 + SQLx + SQLite + Tokio + Serde
+
+Cargo.toml:
+  edition = "2024"
+  deps: axum 0.8, tokio (full), serde (derive), serde_json, sqlx (sqlite, runtime-tokio), tower-http (cors)
+  dev: reqwest 0.13 (rustls)
+
+src/models.rs:
+  #[derive(Debug, Serialize, Deserialize, FromRow)]
+  struct Entity { id: i64, field: String, optional: Option<String> }
+  struct CreateEntity { field: String, optional: Option<String> }
+  Status fields: String with default "pending"
+
+src/handlers.rs:
+  async fn create(State(pool), Json(input)) -> (StatusCode, Json<Entity>)
+  POST → StatusCode::CREATED, sqlx::query("INSERT...").execute + query_as last_insert_rowid
+  GET list → query_as("SELECT * FROM table").fetch_all
+  GET by id → query_as.fetch_optional, return 404 if None
+  PUT → query("UPDATE...SET...WHERE id=?"), rows_affected == 0 → 404
+  DELETE → StatusCode::NO_CONTENT, rows_affected == 0 → 404
+
+src/lib.rs:
+  pub fn app(pool: SqlitePool) -> Router
+  pub async fn init_db(pool: &SqlitePool) → CREATE TABLE IF NOT EXISTS
+  Routes: .route("/{table}", post(create).get(list))
+          .route("/{table}/{id}", get(get_one).put(update).delete(delete_one))
+
+src/main.rs:
+  SqlitePool::connect("sqlite:./app.db"), init_db, bind 0.0.0.0:3000
+
+tests/api_test.rs:
+  Each test: SqlitePool::connect("sqlite::memory:"), init_db, app(pool)
+  Spawn on random port: TcpListener::bind("127.0.0.1:0"), axum::serve
+  reqwest::Client for HTTP calls
+  Unique descriptive data ("Buy milk", "Fetchable task"...)
+  test_create → 201 + assert id exists
+  test_list → post first, get, assert len >= 1
+  test_get_by_id → post, get, assert id matches
+  test_not_found → 404
+  test_update → post, put with ALL fields, assert 200
+  test_delete → post, delete 204, get → 404
--- a/kipina-codebench/prompts/spec-plain.md
+++ b/kipina-codebench/prompts/spec-plain.md
@@ -0,0 +1,19 @@
+You design database schemas. Output ONLY the schema in this exact format, nothing else.
+
+FORMAT (one entity per line):
+project: project-name
+entity EntityName (table_name): field1 type, field2 type, field3 type=default
+entity ChildName (table_name): field1 type, parent_id int->ParentName, field2 type
+
+TYPES: string, text, int, float, bool, date, datetime
+RULES:
+- id is automatic, do NOT include it
+- FK fields end with _id and use -> to reference parent
+- Parent entities BEFORE children
+- Max 7 fields per entity, max 3 entities
+- Status fields: string with =default (e.g. status string=draft)
+
+EXAMPLE:
+project: blog-api
+entity Author (authors): name string, email string, bio text
+entity Post (posts): title string, content text, author_id int->Author, published_at datetime, status string=draft
--- a/kipina-codebench/prompts/spec-simple.md
+++ b/kipina-codebench/prompts/spec-simple.md
@@ -0,0 +1,17 @@
+You design database schemas. Output ONLY valid JSON, no explanations.
+
+SCHEMA:
+{"project_name":"name","entities":[{"name":"Entity","table_name":"entities","fields":[{"name":"field","type":"string","nullable":false,"default":null}]}],"relationships":[{"from":"Child","field":"parent_id","to":"Parent"}]}
+
+FIELD TYPES: string, text, int, float, bool, date, datetime
+- Status fields: type "string", default "draft" or "pending"
+- id is automatic — do NOT include it
+- FK fields: type "int", name ends with _id
+
+RULES:
+- Parent entities BEFORE children in array
+- Every _id field needs a relationship entry
+- Max 7 fields, max 3 entities
+- English names only
+
+EXAMPLE: Blog → Author: name(string), email(string) / Post: title(string), content(text), author_id(int)→Author, status(string,default="draft")
--- a/kipina-codebench/prompts/spec.md
+++ b/kipina-codebench/prompts/spec.md
@@ -0,0 +1,31 @@
+You are a software architect who designs database schemas for Python web applications.
+
+THINK STEP BY STEP before outputting JSON:
+1. What are the main ENTITIES (nouns) in this project?
+2. What FIELDS does each entity need? (name, type, required?)
+3. Which entities REFERENCE each other? (e.g. "a Book belongs to an Author" → Book has author_id)
+4. Are there Date/DateTime fields? → add extra_imports
+
+Then output ONLY valid JSON (no explanations before or after).
+
+SCHEMA:
+{"project_name":"short-name","description":"One sentence","entities":[{"name":"EntityName","table_name":"entity_names","fields":[{"name":"field_name","sa_type":"String(255)","py_type":"str","nullable":false,"default":null}]}],"relationships":[{"from":"ChildEntity","field":"parent_id","to":"ParentEntity","type":"many-to-one"}],"extra_imports":[]}
+
+FIELD RULES:
+- sa_type: String(N), Text, Integer, Date, DateTime, Boolean, Float
+- py_type: str, int, float, bool, date, datetime — append " | None" if nullable
+- Status fields: use String(20) with default value, NEVER Enum
+- Every entity gets "id" automatically — do NOT add id or redundant ID fields
+- Use snake_case for field names
+
+RELATIONSHIP RULES:
+- If entity A "belongs to" entity B → A has b_id field (Integer, nullable=false) + relationship entry
+- EVERY _id field MUST have a matching relationship entry
+- Parent entities must appear BEFORE children in the entities array
+- If no relationships, set "relationships": []
+
+AVOID: redundant ID fields, generic names, more than 7 fields or 3 entities, non-English entity/field names (ALWAYS English even if description is Finnish)
+
+EXAMPLES (adapt, don't copy):
+Todo app → Todo: title(str), description(Text|None), due_date(Date|None), status(String20="pending")
+Blog → Author: name,email,bio(Text|None) / Post: title, content(Text), author_id→Author, published_at(DateTime|None), status(String20="draft")
--- a/kipina-codebench/report-template.html
+++ b/kipina-codebench/report-template.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = /*DATA_PLACEHOLDER*/[];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T06-49.html
+++ b/kipina-codebench/results/2026-04-14T06-49.html
--- a/kipina-codebench/results/2026-04-14T06-49.json
+++ b/kipina-codebench/results/2026-04-14T06-49.json
@@ -0,0 +1,422 @@
+[
+  {
+    "model": "qwen3.5:9b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 3,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 65901,
+    "totalTokens": 5056,
+    "avgTokPerSec": 82.99139473832963,
+    "promptChars": 12334,
+    "promptTokensEst": 3084,
+    "score": 40,
+    "stars": "★★☆☆☆",
+    "error": null
+  },
+  {
+    "model": "qwen3.5:9b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 1,
+    "fixRounds": 2,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 74087,
+    "totalTokens": 5645,
+    "avgTokPerSec": 83.57073831360164,
+    "promptChars": 10757,
+    "promptTokensEst": 2689,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": null
+  },
+  {
+    "model": "qwen3.5:9b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 49830,
+    "totalTokens": 3803,
+    "avgTokPerSec": 83.26266260763309,
+    "promptChars": 10826,
+    "promptTokensEst": 2707,
+    "score": 40,
+    "stars": "★★☆☆☆",
+    "error": null
+  },
+  {
+    "model": "gemma4:e4b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 57032,
+    "totalTokens": 4924,
+    "avgTokPerSec": 106.02334905805122,
+    "promptChars": 11313,
+    "promptTokensEst": 2828,
+    "score": 40,
+    "stars": "★★☆☆☆",
+    "error": null
+  },
+  {
+    "model": "gemma4:e4b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 5,
+    "testsFailed": 2,
+    "totalDurationMs": 54307,
+    "totalTokens": 5060,
+    "avgTokPerSec": 106.89447491163497,
+    "promptChars": 11225,
+    "promptTokensEst": 2806,
+    "score": 83,
+    "stars": "★★★★☆",
+    "error": null
+  },
+  {
+    "model": "gemma4:e4b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 11,
+    "testsPassed": 2,
+    "testsFailed": 9,
+    "totalDurationMs": 57080,
+    "totalTokens": 5310,
+    "avgTokPerSec": 106.64914988130955,
+    "promptChars": 11791,
+    "promptTokensEst": 2948,
+    "score": 51,
+    "stars": "★★★☆☆",
+    "error": null
+  },
+  {
+    "model": "qwen2.5-coder:3b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 3,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 22377,
+    "totalTokens": 3534,
+    "avgTokPerSec": 201.24475679283708,
+    "promptChars": 11479,
+    "promptTokensEst": 2870,
+    "score": 40,
+    "stars": "★★☆☆☆",
+    "error": null
+  },
+  {
+    "model": "qwen2.5-coder:3b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 8,
+    "fixRounds": 2,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 44520,
+    "totalTokens": 7495,
+    "avgTokPerSec": 201.87149050701015,
+    "promptChars": 11886,
+    "promptTokensEst": 2972,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": null
+  },
+  {
+    "model": "qwen2.5-coder:3b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 20136,
+    "totalTokens": 3338,
+    "avgTokPerSec": 200.86152095722105,
+    "promptChars": 11228,
+    "promptTokensEst": 2807,
+    "score": 40,
+    "stars": "★★☆☆☆",
+    "error": null
+  },
+  {
+    "model": "qwen2.5-coder:7b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui"
+  },
+  {
+    "model": "qwen2.5-coder:7b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 20012,
+    "totalTokens": 2119,
+    "avgTokPerSec": 122.7557304112134,
+    "promptChars": 10342,
+    "promptTokensEst": 2586,
+    "score": 40,
+    "stars": "★★☆☆☆",
+    "error": null
+  },
+  {
+    "model": "qwen2.5-coder:7b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 26133,
+    "totalTokens": 2715,
+    "avgTokPerSec": 121.94987205993503,
+    "promptChars": 11193,
+    "promptTokensEst": 2798,
+    "score": 40,
+    "stars": "★★☆☆☆",
+    "error": null
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 44757,
+    "totalTokens": 2156,
+    "avgTokPerSec": 60.77636586631207,
+    "promptChars": 9635,
+    "promptTokensEst": 2409,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 41166,
+    "totalTokens": 2282,
+    "avgTokPerSec": 61.14821289733007,
+    "promptChars": 9575,
+    "promptTokensEst": 2394,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 12,
+    "testsPassed": 12,
+    "testsFailed": 0,
+    "totalDurationMs": 66478,
+    "totalTokens": 3681,
+    "avgTokPerSec": 60.493817783668725,
+    "promptChars": 10500,
+    "promptTokensEst": 2625,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 29801,
+    "totalTokens": 2249,
+    "avgTokPerSec": 98.5661742189331,
+    "promptChars": 9615,
+    "promptTokensEst": 2404,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 6,
+    "testsFailed": 2,
+    "totalDurationMs": 22974,
+    "totalTokens": 2050,
+    "avgTokPerSec": 101.2398768597589,
+    "promptChars": 9273,
+    "promptTokensEst": 2318,
+    "score": 85,
+    "stars": "★★★★☆",
+    "error": null
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 12,
+    "testsPassed": 12,
+    "testsFailed": 0,
+    "totalDurationMs": 39335,
+    "totalTokens": 3537,
+    "avgTokPerSec": 100.10984073540648,
+    "promptChars": 10525,
+    "promptTokensEst": 2631,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null
+  },
+  {
+    "model": "qwen3:4b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 58668,
+    "totalTokens": 7134,
+    "avgTokPerSec": 141.76822189196028,
+    "promptChars": 15202,
+    "promptTokensEst": 3801,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null
+  },
+  {
+    "model": "qwen3:4b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui"
+  },
+  {
+    "model": "qwen3:4b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui"
+  }
+]
--- a/kipina-codebench/results/2026-04-14T07-13.html
+++ b/kipina-codebench/results/2026-04-14T07-13.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3:14b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":186642,"totalTokens":10237,"avgTokPerSec":59.06411550065281,"promptChars":10576,"promptTokensEst":2644,"score":40,"stars":"★★☆☆☆","error":null},{"model":"qwen3:14b","scenario":"users","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":121848,"totalTokens":6735,"avgTokPerSec":59.85231850668119,"promptChars":9684,"promptTokensEst":2421,"score":40,"stars":"★★☆☆☆","error":null},{"model":"qwen3:14b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":11,"testsPassed":9,"testsFailed":2,"totalDurationMs":83491,"totalTokens":4677,"avgTokPerSec":60.222832434869694,"promptChars":10423,"promptTokensEst":2606,"score":89,"stars":"★★★★☆","error":null},{"model":"qwen3:8b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":6,"testsPassed":6,"testsFailed":0,"totalDurationMs":56288,"totalTokens":5235,"avgTokPerSec":99.60027546406452,"promptChars":9307,"promptTokensEst":2327,"score":100,"stars":"★★★★★","error":null},{"model":"qwen3:8b","scenario":"users","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":6,"testsPassed":5,"testsFailed":1,"totalDurationMs":59639,"totalTokens":5526,"avgTokPerSec":99.6742208632186,"promptChars":9158,"promptTokensEst":2290,"score":90,"stars":"★★★★★","error":null},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":11,"testsPassed":10,"testsFailed":1,"totalDurationMs":131793,"totalTokens":11779,"avgTokPerSec":97.17878362853351,"promptChars":10390,"promptTokensEst":2598,"score":95,"stars":"★★★★★","error":null}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T07-13.json
+++ b/kipina-codebench/results/2026-04-14T07-13.json
@@ -0,0 +1,122 @@
+[
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 186642,
+    "totalTokens": 10237,
+    "avgTokPerSec": 59.06411550065281,
+    "promptChars": 10576,
+    "promptTokensEst": 2644,
+    "score": 40,
+    "stars": "★★☆☆☆",
+    "error": null
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 121848,
+    "totalTokens": 6735,
+    "avgTokPerSec": 59.85231850668119,
+    "promptChars": 9684,
+    "promptTokensEst": 2421,
+    "score": 40,
+    "stars": "★★☆☆☆",
+    "error": null
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 11,
+    "testsPassed": 9,
+    "testsFailed": 2,
+    "totalDurationMs": 83491,
+    "totalTokens": 4677,
+    "avgTokPerSec": 60.222832434869694,
+    "promptChars": 10423,
+    "promptTokensEst": 2606,
+    "score": 89,
+    "stars": "★★★★☆",
+    "error": null
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 56288,
+    "totalTokens": 5235,
+    "avgTokPerSec": 99.60027546406452,
+    "promptChars": 9307,
+    "promptTokensEst": 2327,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 5,
+    "testsFailed": 1,
+    "totalDurationMs": 59639,
+    "totalTokens": 5526,
+    "avgTokPerSec": 99.6742208632186,
+    "promptChars": 9158,
+    "promptTokensEst": 2290,
+    "score": 90,
+    "stars": "★★★★★",
+    "error": null
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 11,
+    "testsPassed": 10,
+    "testsFailed": 1,
+    "totalDurationMs": 131793,
+    "totalTokens": 11779,
+    "avgTokPerSec": 97.17878362853351,
+    "promptChars": 10390,
+    "promptTokensEst": 2598,
+    "score": 95,
+    "stars": "★★★★★",
+    "error": null
+  }
+]
--- a/kipina-codebench/results/2026-04-14T07-18.html
+++ b/kipina-codebench/results/2026-04-14T07-18.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3:14b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":66903,"totalTokens":5454,"avgTokPerSec":86.45918994499432,"promptChars":9985,"promptTokensEst":2496,"score":40,"stars":"★★☆☆☆","error":null},{"model":"qwen3:14b","scenario":"users","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":87618,"totalTokens":7150,"avgTokPerSec":87.21782190501095,"promptChars":9922,"promptTokensEst":2481,"score":40,"stars":"★★☆☆☆","error":null},{"model":"qwen3:14b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":9,"testsPassed":5,"testsFailed":4,"totalDurationMs":78398,"totalTokens":6427,"avgTokPerSec":85.52353711143463,"promptChars":10737,"promptTokensEst":2684,"score":73,"stars":"★★★★☆","error":null},{"model":"qwen3:8b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":8,"testsPassed":7,"testsFailed":1,"totalDurationMs":82750,"totalTokens":10054,"avgTokPerSec":139.90690936146032,"promptChars":9360,"promptTokensEst":2340,"score":93,"stars":"★★★★★","error":null},{"model":"qwen3:8b","scenario":"users","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":6,"testsPassed":6,"testsFailed":0,"totalDurationMs":32233,"totalTokens":4404,"avgTokPerSec":143.4997404058814,"promptChars":9310,"promptTokensEst":2328,"score":100,"stars":"★★★★★","error":null},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":88563,"totalTokens":11575,"avgTokPerSec":141.54675017528362,"promptChars":10567,"promptTokensEst":2642,"score":40,"stars":"★★☆☆☆","error":null}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T07-18.json
+++ b/kipina-codebench/results/2026-04-14T07-18.json
@@ -0,0 +1,122 @@
+[
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 66903,
+    "totalTokens": 5454,
+    "avgTokPerSec": 86.45918994499432,
+    "promptChars": 9985,
+    "promptTokensEst": 2496,
+    "score": 40,
+    "stars": "★★☆☆☆",
+    "error": null
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 87618,
+    "totalTokens": 7150,
+    "avgTokPerSec": 87.21782190501095,
+    "promptChars": 9922,
+    "promptTokensEst": 2481,
+    "score": 40,
+    "stars": "★★☆☆☆",
+    "error": null
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 9,
+    "testsPassed": 5,
+    "testsFailed": 4,
+    "totalDurationMs": 78398,
+    "totalTokens": 6427,
+    "avgTokPerSec": 85.52353711143463,
+    "promptChars": 10737,
+    "promptTokensEst": 2684,
+    "score": 73,
+    "stars": "★★★★☆",
+    "error": null
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 7,
+    "testsFailed": 1,
+    "totalDurationMs": 82750,
+    "totalTokens": 10054,
+    "avgTokPerSec": 139.90690936146032,
+    "promptChars": 9360,
+    "promptTokensEst": 2340,
+    "score": 93,
+    "stars": "★★★★★",
+    "error": null
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 32233,
+    "totalTokens": 4404,
+    "avgTokPerSec": 143.4997404058814,
+    "promptChars": 9310,
+    "promptTokensEst": 2328,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 88563,
+    "totalTokens": 11575,
+    "avgTokPerSec": 141.54675017528362,
+    "promptChars": 10567,
+    "promptTokensEst": 2642,
+    "score": 40,
+    "stars": "★★☆☆☆",
+    "error": null
+  }
+]
--- a/kipina-codebench/results/2026-04-14T07-55.html
+++ b/kipina-codebench/results/2026-04-14T07-55.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3:14b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":9,"testsPassed":6,"testsFailed":3,"totalDurationMs":50350,"totalTokens":2797,"avgTokPerSec":60.919860198859574,"promptChars":9858,"promptTokensEst":2465,"score":80,"stars":"★★★★☆","error":null},{"model":"qwen3:14b","scenario":"users","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":8,"testsPassed":6,"testsFailed":2,"totalDurationMs":46557,"totalTokens":2584,"avgTokPerSec":60.88834523948,"promptChars":9544,"promptTokensEst":2386,"score":85,"stars":"★★★★☆","error":null},{"model":"qwen3:14b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":15,"testsPassed":2,"testsFailed":13,"totalDurationMs":90761,"totalTokens":4979,"avgTokPerSec":60.19247492391319,"promptChars":10521,"promptTokensEst":2630,"score":48,"stars":"★★☆☆☆","error":null},{"model":"qwen3:8b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":27360,"totalTokens":2466,"avgTokPerSec":100.9922018173994,"promptChars":9767,"promptTokensEst":2442,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat"},{"model":"qwen3:8b","scenario":"users","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":7,"testsPassed":7,"testsFailed":0,"totalDurationMs":20920,"totalTokens":1876,"avgTokPerSec":101.60760023892685,"promptChars":8782,"promptTokensEst":2196,"score":100,"stars":"★★★★★","error":null},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":10,"testsPassed":9,"testsFailed":1,"totalDurationMs":35766,"totalTokens":3217,"avgTokPerSec":100.40066102398943,"promptChars":10334,"promptTokensEst":2584,"score":94,"stars":"★★★★★","error":null}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T07-55.json
+++ b/kipina-codebench/results/2026-04-14T07-55.json
@@ -0,0 +1,122 @@
+[
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 9,
+    "testsPassed": 6,
+    "testsFailed": 3,
+    "totalDurationMs": 50350,
+    "totalTokens": 2797,
+    "avgTokPerSec": 60.919860198859574,
+    "promptChars": 9858,
+    "promptTokensEst": 2465,
+    "score": 80,
+    "stars": "★★★★☆",
+    "error": null
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 6,
+    "testsFailed": 2,
+    "totalDurationMs": 46557,
+    "totalTokens": 2584,
+    "avgTokPerSec": 60.88834523948,
+    "promptChars": 9544,
+    "promptTokensEst": 2386,
+    "score": 85,
+    "stars": "★★★★☆",
+    "error": null
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 15,
+    "testsPassed": 2,
+    "testsFailed": 13,
+    "totalDurationMs": 90761,
+    "totalTokens": 4979,
+    "avgTokPerSec": 60.19247492391319,
+    "promptChars": 10521,
+    "promptTokensEst": 2630,
+    "score": 48,
+    "stars": "★★☆☆☆",
+    "error": null
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 27360,
+    "totalTokens": 2466,
+    "avgTokPerSec": 100.9922018173994,
+    "promptChars": 9767,
+    "promptTokensEst": 2442,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat"
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 20920,
+    "totalTokens": 1876,
+    "avgTokPerSec": 101.60760023892685,
+    "promptChars": 8782,
+    "promptTokensEst": 2196,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 10,
+    "testsPassed": 9,
+    "testsFailed": 1,
+    "totalDurationMs": 35766,
+    "totalTokens": 3217,
+    "avgTokPerSec": 100.40066102398943,
+    "promptChars": 10334,
+    "promptTokensEst": 2584,
+    "score": 94,
+    "stars": "★★★★★",
+    "error": null
+  }
+]
--- a/kipina-codebench/results/2026-04-14T08-05.html
+++ b/kipina-codebench/results/2026-04-14T08-05.html
--- a/kipina-codebench/results/2026-04-14T08-05.json
+++ b/kipina-codebench/results/2026-04-14T08-05.json
@@ -0,0 +1,947 @@
+[
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 1,
+    "testsFailed": 5,
+    "totalDurationMs": 30801,
+    "totalTokens": 2333,
+    "avgTokPerSec": 122.77922150989748,
+    "promptChars": 10015,
+    "promptTokensEst": 2504,
+    "score": 50,
+    "stars": "★★★☆☆",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 6,
+    "testsFailed": 1,
+    "totalDurationMs": 25495,
+    "totalTokens": 2714,
+    "avgTokPerSec": 122.70970007652487,
+    "promptChars": 9891,
+    "promptTokensEst": 2473,
+    "score": 91,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 11,
+    "testsPassed": 10,
+    "testsFailed": 1,
+    "totalDurationMs": 37153,
+    "totalTokens": 3979,
+    "avgTokPerSec": 121.9183958236036,
+    "promptChars": 11158,
+    "promptTokensEst": 2790,
+    "score": 95,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 6,
+    "testsFailed": 1,
+    "totalDurationMs": 43456,
+    "totalTokens": 2411,
+    "avgTokPerSec": 60.89226084568145,
+    "promptChars": 9831,
+    "promptTokensEst": 2458,
+    "score": 91,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 8,
+    "testsFailed": 0,
+    "totalDurationMs": 40376,
+    "totalTokens": 2237,
+    "avgTokPerSec": 61.028627032662456,
+    "promptChars": 9343,
+    "promptTokensEst": 2336,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 12,
+    "testsPassed": 2,
+    "testsFailed": 10,
+    "totalDurationMs": 68620,
+    "totalTokens": 3796,
+    "avgTokPerSec": 60.47793268944476,
+    "promptChars": 10497,
+    "promptTokensEst": 2624,
+    "score": 50,
+    "stars": "★★★☆☆",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 25235,
+    "totalTokens": 2269,
+    "avgTokPerSec": 101.24212769079884,
+    "promptChars": 9294,
+    "promptTokensEst": 2324,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 6,
+    "testsFailed": 2,
+    "totalDurationMs": 21720,
+    "totalTokens": 1942,
+    "avgTokPerSec": 101.65074583709965,
+    "promptChars": 9020,
+    "promptTokensEst": 2255,
+    "score": 85,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 11,
+    "testsPassed": 10,
+    "testsFailed": 1,
+    "totalDurationMs": 39006,
+    "totalTokens": 3509,
+    "avgTokPerSec": 100.43593706181406,
+    "promptChars": 10372,
+    "promptTokensEst": 2593,
+    "score": 95,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 21989,
+    "totalTokens": 2339,
+    "avgTokPerSec": 122.8454095677367,
+    "promptChars": 10052,
+    "promptTokensEst": 2513,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 23997,
+    "totalTokens": 2551,
+    "avgTokPerSec": 122.23722733560855,
+    "promptChars": 9973,
+    "promptTokensEst": 2493,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 8,
+    "testsFailed": 0,
+    "totalDurationMs": 30169,
+    "totalTokens": 3249,
+    "avgTokPerSec": 123.04696524796096,
+    "promptChars": 11097,
+    "promptTokensEst": 2774,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 9,
+    "testsPassed": 6,
+    "testsFailed": 3,
+    "totalDurationMs": 47091,
+    "totalTokens": 2602,
+    "avgTokPerSec": 60.962687726457375,
+    "promptChars": 9633,
+    "promptTokensEst": 2408,
+    "score": 80,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 41747,
+    "totalTokens": 2313,
+    "avgTokPerSec": 60.949025583617605,
+    "promptChars": 9373,
+    "promptTokensEst": 2343,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 12,
+    "testsPassed": 2,
+    "testsFailed": 10,
+    "totalDurationMs": 66888,
+    "totalTokens": 3699,
+    "avgTokPerSec": 60.49540514685331,
+    "promptChars": 10323,
+    "promptTokensEst": 2581,
+    "score": 50,
+    "stars": "★★★☆☆",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 7,
+    "testsFailed": 1,
+    "totalDurationMs": 27036,
+    "totalTokens": 2434,
+    "avgTokPerSec": 101.01399069228444,
+    "promptChars": 9513,
+    "promptTokensEst": 2378,
+    "score": 93,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 6,
+    "testsFailed": 1,
+    "totalDurationMs": 20927,
+    "totalTokens": 1872,
+    "avgTokPerSec": 101.45096098956486,
+    "promptChars": 8881,
+    "promptTokensEst": 2220,
+    "score": 91,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 2
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 6,
+    "testsFailed": 2,
+    "totalDurationMs": 26919,
+    "totalTokens": 2889,
+    "avgTokPerSec": 123.63666629145064,
+    "promptChars": 10162,
+    "promptTokensEst": 2541,
+    "score": 85,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 8,
+    "testsFailed": 0,
+    "totalDurationMs": 27592,
+    "totalTokens": 2946,
+    "avgTokPerSec": 122.33273400152825,
+    "promptChars": 9469,
+    "promptTokensEst": 2367,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 11,
+    "testsPassed": 11,
+    "testsFailed": 0,
+    "totalDurationMs": 35734,
+    "totalTokens": 3827,
+    "avgTokPerSec": 122.65156559717951,
+    "promptChars": 11086,
+    "promptTokensEst": 2772,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 9,
+    "testsPassed": 6,
+    "testsFailed": 3,
+    "totalDurationMs": 50372,
+    "totalTokens": 2795,
+    "avgTokPerSec": 60.91611850918806,
+    "promptChars": 9758,
+    "promptTokensEst": 2440,
+    "score": 80,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 1,
+    "testsFailed": 5,
+    "totalDurationMs": 38716,
+    "totalTokens": 2144,
+    "avgTokPerSec": 61.0412890406478,
+    "promptChars": 9415,
+    "promptTokensEst": 2354,
+    "score": 50,
+    "stars": "★★★☆☆",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 14,
+    "testsPassed": 7,
+    "testsFailed": 7,
+    "totalDurationMs": 74882,
+    "totalTokens": 4130,
+    "avgTokPerSec": 60.32640855026445,
+    "promptChars": 10506,
+    "promptTokensEst": 2627,
+    "score": 70,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 3,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 35913,
+    "totalTokens": 3218,
+    "avgTokPerSec": 100.38516205100154,
+    "promptChars": 11338,
+    "promptTokensEst": 2835,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 20974,
+    "totalTokens": 1880,
+    "avgTokPerSec": 101.52450928280543,
+    "promptChars": 8803,
+    "promptTokensEst": 2201,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 11,
+    "testsPassed": 9,
+    "testsFailed": 2,
+    "totalDurationMs": 36005,
+    "totalTokens": 3243,
+    "avgTokPerSec": 100.44301406462307,
+    "promptChars": 10414,
+    "promptTokensEst": 2604,
+    "score": 89,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 1,
+    "testsFailed": 6,
+    "totalDurationMs": 23071,
+    "totalTokens": 2469,
+    "avgTokPerSec": 124.09643322620661,
+    "promptChars": 9960,
+    "promptTokensEst": 2490,
+    "score": 49,
+    "stars": "★★☆☆☆",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 2,
+    "testsFailed": 6,
+    "totalDurationMs": 27062,
+    "totalTokens": 2907,
+    "avgTokPerSec": 123.35530975346687,
+    "promptChars": 9558,
+    "promptTokensEst": 2390,
+    "score": 55,
+    "stars": "★★★☆☆",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 9,
+    "testsPassed": 9,
+    "testsFailed": 0,
+    "totalDurationMs": 29395,
+    "totalTokens": 3156,
+    "avgTokPerSec": 123.22575073561812,
+    "promptChars": 10574,
+    "promptTokensEst": 2644,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 39590,
+    "totalTokens": 2198,
+    "avgTokPerSec": 61.051945510465806,
+    "promptChars": 9664,
+    "promptTokensEst": 2416,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 1,
+    "testsFailed": 5,
+    "totalDurationMs": 36950,
+    "totalTokens": 2042,
+    "avgTokPerSec": 61.01436784429489,
+    "promptChars": 9225,
+    "promptTokensEst": 2306,
+    "score": 50,
+    "stars": "★★★☆☆",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 14,
+    "testsPassed": 2,
+    "testsFailed": 12,
+    "totalDurationMs": 80600,
+    "totalTokens": 4437,
+    "avgTokPerSec": 60.29371170543078,
+    "promptChars": 10688,
+    "promptTokensEst": 2672,
+    "score": 49,
+    "stars": "★★☆☆☆",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 29125,
+    "totalTokens": 2619,
+    "avgTokPerSec": 100.90587777586212,
+    "promptChars": 9899,
+    "promptTokensEst": 2475,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 8,
+    "testsFailed": 0,
+    "totalDurationMs": 21847,
+    "totalTokens": 1957,
+    "avgTokPerSec": 101.44111070734304,
+    "promptChars": 8946,
+    "promptTokensEst": 2237,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 4
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 1,
+    "testsFailed": 5,
+    "totalDurationMs": 21127,
+    "totalTokens": 2245,
+    "avgTokPerSec": 124.22714049663371,
+    "promptChars": 9972,
+    "promptTokensEst": 2493,
+    "score": 50,
+    "stars": "★★★☆☆",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 9,
+    "testsPassed": 7,
+    "testsFailed": 2,
+    "totalDurationMs": 30281,
+    "totalTokens": 3079,
+    "avgTokPerSec": 123.00254714651271,
+    "promptChars": 9562,
+    "promptTokensEst": 2391,
+    "score": 87,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 12,
+    "testsPassed": 12,
+    "testsFailed": 0,
+    "totalDurationMs": 39630,
+    "totalTokens": 4274,
+    "avgTokPerSec": 123.08303937451802,
+    "promptChars": 11119,
+    "promptTokensEst": 2780,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 38032,
+    "totalTokens": 2104,
+    "avgTokPerSec": 61.05445464163662,
+    "promptChars": 9455,
+    "promptTokensEst": 2364,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 39620,
+    "totalTokens": 2193,
+    "avgTokPerSec": 61.04565233675101,
+    "promptChars": 9481,
+    "promptTokensEst": 2370,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "round": 5
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 9,
+    "testsPassed": 7,
+    "testsFailed": 2,
+    "totalDurationMs": 63579,
+    "totalTokens": 3520,
+    "avgTokPerSec": 60.51513453009977,
+    "promptChars": 10493,
+    "promptTokensEst": 2623,
+    "score": 87,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 9,
+    "testsPassed": 6,
+    "testsFailed": 3,
+    "totalDurationMs": 30845,
+    "totalTokens": 2777,
+    "avgTokPerSec": 100.79046137130972,
+    "promptChars": 9507,
+    "promptTokensEst": 2377,
+    "score": 80,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 6,
+    "testsFailed": 2,
+    "totalDurationMs": 21413,
+    "totalTokens": 1914,
+    "avgTokPerSec": 101.25525436264132,
+    "promptChars": 8804,
+    "promptTokensEst": 2201,
+    "score": 85,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 5
+  }
+]
--- a/kipina-codebench/results/2026-04-14T08-18.html
+++ b/kipina-codebench/results/2026-04-14T08-18.html
--- a/kipina-codebench/results/2026-04-14T08-18.json
+++ b/kipina-codebench/results/2026-04-14T08-18.json
@@ -0,0 +1,947 @@
+[
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 9,
+    "testsPassed": 6,
+    "testsFailed": 3,
+    "totalDurationMs": 33892,
+    "totalTokens": 2675,
+    "avgTokPerSec": 88.07409036121237,
+    "promptChars": 9688,
+    "promptTokensEst": 2422,
+    "score": 80,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 6,
+    "testsFailed": 2,
+    "totalDurationMs": 30647,
+    "totalTokens": 2549,
+    "avgTokPerSec": 88.4488185974085,
+    "promptChars": 9594,
+    "promptTokensEst": 2399,
+    "score": 85,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 13,
+    "testsPassed": 6,
+    "testsFailed": 7,
+    "totalDurationMs": 44371,
+    "totalTokens": 3678,
+    "avgTokPerSec": 88.172616246191,
+    "promptChars": 10432,
+    "promptTokensEst": 2608,
+    "score": 68,
+    "stars": "★★★☆☆",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 6,
+    "testsFailed": 1,
+    "totalDurationMs": 18385,
+    "totalTokens": 2375,
+    "avgTokPerSec": 147.62230806597154,
+    "promptChars": 9478,
+    "promptTokensEst": 2370,
+    "score": 91,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 13968,
+    "totalTokens": 1904,
+    "avgTokPerSec": 148.3084817167518,
+    "promptChars": 8837,
+    "promptTokensEst": 2209,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 25642,
+    "totalTokens": 3476,
+    "avgTokPerSec": 146.49556892944076,
+    "promptChars": 10734,
+    "promptTokensEst": 2684,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "round": 1
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 6,
+    "testsFailed": 2,
+    "totalDurationMs": 19982,
+    "totalTokens": 2937,
+    "avgTokPerSec": 191.2786317674431,
+    "promptChars": 10281,
+    "promptTokensEst": 2570,
+    "score": 85,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 17114,
+    "totalTokens": 2903,
+    "avgTokPerSec": 190.51221206765385,
+    "promptChars": 9654,
+    "promptTokensEst": 2414,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 11,
+    "testsPassed": 11,
+    "testsFailed": 0,
+    "totalDurationMs": 22352,
+    "totalTokens": 3776,
+    "avgTokPerSec": 190.56628728306987,
+    "promptChars": 11134,
+    "promptTokensEst": 2784,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 6,
+    "testsFailed": 2,
+    "totalDurationMs": 31217,
+    "totalTokens": 2463,
+    "avgTokPerSec": 88.6684646675098,
+    "promptChars": 9598,
+    "promptTokensEst": 2400,
+    "score": 85,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 27520,
+    "totalTokens": 2288,
+    "avgTokPerSec": 88.64765360012593,
+    "promptChars": 9612,
+    "promptTokensEst": 2403,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 12,
+    "testsPassed": 3,
+    "testsFailed": 9,
+    "totalDurationMs": 41874,
+    "totalTokens": 3474,
+    "avgTokPerSec": 88.22266853318554,
+    "promptChars": 10408,
+    "promptTokensEst": 2602,
+    "score": 55,
+    "stars": "★★★☆☆",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 11,
+    "testsPassed": 11,
+    "testsFailed": 0,
+    "totalDurationMs": 24781,
+    "totalTokens": 3240,
+    "avgTokPerSec": 146.89167309934365,
+    "promptChars": 10179,
+    "promptTokensEst": 2545,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 9,
+    "testsPassed": 6,
+    "testsFailed": 3,
+    "totalDurationMs": 19148,
+    "totalTokens": 2605,
+    "avgTokPerSec": 147.55250620481297,
+    "promptChars": 9634,
+    "promptTokensEst": 2409,
+    "score": 80,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 11,
+    "testsPassed": 11,
+    "testsFailed": 0,
+    "totalDurationMs": 23816,
+    "totalTokens": 3232,
+    "avgTokPerSec": 147.25857324533817,
+    "promptChars": 9226,
+    "promptTokensEst": 2307,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 16639,
+    "totalTokens": 2369,
+    "avgTokPerSec": 191.61273045157245,
+    "promptChars": 10048,
+    "promptTokensEst": 2512,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 9,
+    "testsPassed": 8,
+    "testsFailed": 1,
+    "totalDurationMs": 18588,
+    "totalTokens": 3163,
+    "avgTokPerSec": 190.86975006725547,
+    "promptChars": 10048,
+    "promptTokensEst": 2512,
+    "score": 93,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 10,
+    "testsPassed": 10,
+    "testsFailed": 0,
+    "totalDurationMs": 22677,
+    "totalTokens": 3828,
+    "avgTokPerSec": 190.15611016906482,
+    "promptChars": 11090,
+    "promptTokensEst": 2773,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 26449,
+    "totalTokens": 2063,
+    "avgTokPerSec": 88.77498453063184,
+    "promptChars": 9608,
+    "promptTokensEst": 2402,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 27510,
+    "totalTokens": 2289,
+    "avgTokPerSec": 88.74699253414485,
+    "promptChars": 9418,
+    "promptTokensEst": 2355,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 12,
+    "testsPassed": 3,
+    "testsFailed": 9,
+    "totalDurationMs": 45105,
+    "totalTokens": 3738,
+    "avgTokPerSec": 88.04788102995212,
+    "promptChars": 10564,
+    "promptTokensEst": 2641,
+    "score": 55,
+    "stars": "★★★☆☆",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 7,
+    "testsFailed": 1,
+    "totalDurationMs": 19204,
+    "totalTokens": 2480,
+    "avgTokPerSec": 147.91758782382294,
+    "promptChars": 9391,
+    "promptTokensEst": 2348,
+    "score": 93,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 12990,
+    "totalTokens": 1769,
+    "avgTokPerSec": 148.2616673700717,
+    "promptChars": 8898,
+    "promptTokensEst": 2225,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 12,
+    "testsPassed": 10,
+    "testsFailed": 2,
+    "totalDurationMs": 25831,
+    "totalTokens": 3500,
+    "avgTokPerSec": 146.86924785880186,
+    "promptChars": 9465,
+    "promptTokensEst": 2366,
+    "score": 90,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 19453,
+    "totalTokens": 2845,
+    "avgTokPerSec": 191.37382231956113,
+    "promptChars": 10157,
+    "promptTokensEst": 2539,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 9,
+    "testsPassed": 9,
+    "testsFailed": 0,
+    "totalDurationMs": 21570,
+    "totalTokens": 3529,
+    "avgTokPerSec": 190.65454623497536,
+    "promptChars": 9732,
+    "promptTokensEst": 2433,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 11,
+    "testsPassed": 11,
+    "testsFailed": 0,
+    "totalDurationMs": 25537,
+    "totalTokens": 4300,
+    "avgTokPerSec": 189.94521619124598,
+    "promptChars": 11127,
+    "promptTokensEst": 2782,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 9,
+    "testsPassed": 7,
+    "testsFailed": 2,
+    "totalDurationMs": 31923,
+    "totalTokens": 2522,
+    "avgTokPerSec": 88.62182881661799,
+    "promptChars": 9700,
+    "promptTokensEst": 2425,
+    "score": 87,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 26000,
+    "totalTokens": 2163,
+    "avgTokPerSec": 88.86878707672254,
+    "promptChars": 9288,
+    "promptTokensEst": 2322,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 10,
+    "testsPassed": 10,
+    "testsFailed": 0,
+    "totalDurationMs": 43275,
+    "totalTokens": 3588,
+    "avgTokPerSec": 88.24995936347965,
+    "promptChars": 10173,
+    "promptTokensEst": 2543,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 14,
+    "testsPassed": 0,
+    "testsFailed": 14,
+    "totalDurationMs": 30045,
+    "totalTokens": 3913,
+    "avgTokPerSec": 146.51683619371713,
+    "promptChars": 10334,
+    "promptTokensEst": 2584,
+    "score": 40,
+    "stars": "★★☆☆☆",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 9,
+    "testsPassed": 5,
+    "testsFailed": 4,
+    "totalDurationMs": 17076,
+    "totalTokens": 2321,
+    "avgTokPerSec": 147.99547121069506,
+    "promptChars": 9451,
+    "promptTokensEst": 2363,
+    "score": 73,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 11,
+    "testsPassed": 11,
+    "testsFailed": 0,
+    "totalDurationMs": 23890,
+    "totalTokens": 3243,
+    "avgTokPerSec": 147.20125507974117,
+    "promptChars": 9217,
+    "promptTokensEst": 2304,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 6,
+    "testsFailed": 2,
+    "totalDurationMs": 21812,
+    "totalTokens": 3246,
+    "avgTokPerSec": 191.07801335688654,
+    "promptChars": 10249,
+    "promptTokensEst": 2562,
+    "score": 85,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 9,
+    "testsPassed": 8,
+    "testsFailed": 1,
+    "totalDurationMs": 20325,
+    "totalTokens": 3441,
+    "avgTokPerSec": 190.10241840094508,
+    "promptChars": 9930,
+    "promptTokensEst": 2483,
+    "score": 93,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 12,
+    "testsPassed": 12,
+    "testsFailed": 0,
+    "totalDurationMs": 26087,
+    "totalTokens": 4387,
+    "avgTokPerSec": 189.8005689388054,
+    "promptChars": 11109,
+    "promptTokensEst": 2777,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 30287,
+    "totalTokens": 2388,
+    "avgTokPerSec": 88.72243320918638,
+    "promptChars": 9695,
+    "promptTokensEst": 2424,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 9,
+    "testsPassed": 6,
+    "testsFailed": 3,
+    "totalDurationMs": 31212,
+    "totalTokens": 2601,
+    "avgTokPerSec": 88.71289036919063,
+    "promptChars": 9619,
+    "promptTokensEst": 2405,
+    "score": 80,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 15,
+    "testsPassed": 3,
+    "testsFailed": 12,
+    "totalDurationMs": 50939,
+    "totalTokens": 4217,
+    "avgTokPerSec": 88.06125722020734,
+    "promptChars": 10743,
+    "promptTokensEst": 2686,
+    "score": 52,
+    "stars": "★★★☆☆",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 6,
+    "testsFailed": 1,
+    "totalDurationMs": 17913,
+    "totalTokens": 2310,
+    "avgTokPerSec": 148.0291268001691,
+    "promptChars": 9357,
+    "promptTokensEst": 2339,
+    "score": 91,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 8,
+    "testsFailed": 0,
+    "totalDurationMs": 13948,
+    "totalTokens": 1898,
+    "avgTokPerSec": 148.37907379944423,
+    "promptChars": 8725,
+    "promptTokensEst": 2181,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 5
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 1,
+    "testsFailed": 5,
+    "totalDurationMs": 15229,
+    "totalTokens": 2119,
+    "avgTokPerSec": 192.33007410215646,
+    "promptChars": 9827,
+    "promptTokensEst": 2457,
+    "score": 50,
+    "stars": "★★★☆☆",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 18223,
+    "totalTokens": 3093,
+    "avgTokPerSec": 190.71372054282037,
+    "promptChars": 9641,
+    "promptTokensEst": 2410,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 10,
+    "testsPassed": 1,
+    "testsFailed": 9,
+    "totalDurationMs": 21215,
+    "totalTokens": 3589,
+    "avgTokPerSec": 190.49493540345176,
+    "promptChars": 11180,
+    "promptTokensEst": 2795,
+    "score": 46,
+    "stars": "★★☆☆☆",
+    "error": null,
+    "round": 5
+  }
+]
--- a/kipina-codebench/results/2026-04-14T09-43.html
+++ b/kipina-codebench/results/2026-04-14T09-43.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3-coder:30b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":6,"testsPassed":6,"testsFailed":0,"totalDurationMs":21688,"totalTokens":2243,"avgTokPerSec":121.7719614197307,"promptChars":11588,"promptTokensEst":2897,"score":100,"stars":"★★★★★","error":null}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T09-43.json
+++ b/kipina-codebench/results/2026-04-14T09-43.json
@@ -0,0 +1,22 @@
+[
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 21688,
+    "totalTokens": 2243,
+    "avgTokPerSec": 121.7719614197307,
+    "promptChars": 11588,
+    "promptTokensEst": 2897,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null
+  }
+]
--- a/kipina-codebench/results/2026-04-14T09-44.html
+++ b/kipina-codebench/results/2026-04-14T09-44.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3:8b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":6,"testsPassed":6,"testsFailed":0,"totalDurationMs":23521,"totalTokens":2090,"avgTokPerSec":100.94324085271073,"promptChars":10962,"promptTokensEst":2741,"score":100,"stars":"★★★★★","error":null},{"model":"qwen3:8b","scenario":"users","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":1,"testsTotal":6,"testsPassed":6,"testsFailed":0,"totalDurationMs":33680,"totalTokens":3003,"avgTokPerSec":100.52754588753601,"promptChars":10171,"promptTokensEst":2543,"score":90,"stars":"★★★★★","error":null},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui"}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T09-44.json
+++ b/kipina-codebench/results/2026-04-14T09-44.json
@@ -0,0 +1,62 @@
+[
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 23521,
+    "totalTokens": 2090,
+    "avgTokPerSec": 100.94324085271073,
+    "promptChars": 10962,
+    "promptTokensEst": 2741,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 1,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 33680,
+    "totalTokens": 3003,
+    "avgTokPerSec": 100.52754588753601,
+    "promptChars": 10171,
+    "promptTokensEst": 2543,
+    "score": 90,
+    "stars": "★★★★★",
+    "error": null
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui"
+  }
+]
--- a/kipina-codebench/results/2026-04-14T09-47.html
+++ b/kipina-codebench/results/2026-04-14T09-47.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3:8b","scenario":"todo","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":3,"testsTotal":8,"testsPassed":6,"testsFailed":2,"totalDurationMs":97470,"totalTokens":8786,"avgTokPerSec":97.96636139685832,"promptChars":11290,"promptTokensEst":2823,"score":65,"stars":"★★★☆☆","error":null},{"model":"qwen3:8b","scenario":"users","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":6,"testsPassed":6,"testsFailed":0,"totalDurationMs":18951,"totalTokens":1666,"avgTokPerSec":101.807593927545,"promptChars":10293,"promptTokensEst":2573,"score":100,"stars":"★★★★★","error":null},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":126005,"totalTokens":11056,"avgTokPerSec":96.6373549161171,"promptChars":11878,"promptTokensEst":2970,"score":20,"stars":"★☆☆☆☆","error":"Syntaksivirhe"}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T09-47.json
+++ b/kipina-codebench/results/2026-04-14T09-47.json
@@ -0,0 +1,62 @@
+[
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 8,
+    "testsPassed": 6,
+    "testsFailed": 2,
+    "totalDurationMs": 97470,
+    "totalTokens": 8786,
+    "avgTokPerSec": 97.96636139685832,
+    "promptChars": 11290,
+    "promptTokensEst": 2823,
+    "score": 65,
+    "stars": "★★★☆☆",
+    "error": null
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 18951,
+    "totalTokens": 1666,
+    "avgTokPerSec": 101.807593927545,
+    "promptChars": 10293,
+    "promptTokensEst": 2573,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 126005,
+    "totalTokens": 11056,
+    "avgTokPerSec": 96.6373549161171,
+    "promptChars": 11878,
+    "promptTokensEst": 2970,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": "Syntaksivirhe"
+  }
+]
--- a/kipina-codebench/results/2026-04-14T09-52.html
+++ b/kipina-codebench/results/2026-04-14T09-52.html
--- a/kipina-codebench/results/2026-04-14T09-52.json
+++ b/kipina-codebench/results/2026-04-14T09-52.json
@@ -0,0 +1,947 @@
+[
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 25444,
+    "totalTokens": 2661,
+    "avgTokPerSec": 122.06801173056196,
+    "promptChars": 11849,
+    "promptTokensEst": 2962,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 24447,
+    "totalTokens": 2537,
+    "avgTokPerSec": 121.11837170891442,
+    "promptChars": 11045,
+    "promptTokensEst": 2761,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 11,
+    "testsPassed": 11,
+    "testsFailed": 0,
+    "totalDurationMs": 38071,
+    "totalTokens": 3965,
+    "avgTokPerSec": 120.37309655579647,
+    "promptChars": 12702,
+    "promptTokensEst": 3176,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 38459,
+    "totalTokens": 2106,
+    "avgTokPerSec": 60.889088461567745,
+    "promptChars": 10951,
+    "promptTokensEst": 2738,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 35959,
+    "totalTokens": 1966,
+    "avgTokPerSec": 60.9684885562545,
+    "promptChars": 10698,
+    "promptTokensEst": 2675,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 13,
+    "testsPassed": 2,
+    "testsFailed": 11,
+    "totalDurationMs": 269370,
+    "totalTokens": 14361,
+    "avgTokPerSec": 57.79069860126629,
+    "promptChars": 11838,
+    "promptTokensEst": 2960,
+    "score": 29,
+    "stars": "★★☆☆☆",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 23199,
+    "totalTokens": 2054,
+    "avgTokPerSec": 101.09280595816365,
+    "promptChars": 10854,
+    "promptTokensEst": 2714,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 72665,
+    "totalTokens": 6586,
+    "avgTokPerSec": 99.40636298490288,
+    "promptChars": 10157,
+    "promptTokensEst": 2539,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": "Syntaksivirhe",
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 136309,
+    "totalTokens": 12036,
+    "avgTokPerSec": 97.02525169408467,
+    "promptChars": 10823,
+    "promptTokensEst": 2706,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "round": 1
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 28177,
+    "totalTokens": 2946,
+    "avgTokPerSec": 121.23541038097,
+    "promptChars": 11836,
+    "promptTokensEst": 2959,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 8,
+    "testsFailed": 0,
+    "totalDurationMs": 22631,
+    "totalTokens": 2352,
+    "avgTokPerSec": 121.93930190168658,
+    "promptChars": 10440,
+    "promptTokensEst": 2610,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 12,
+    "testsPassed": 12,
+    "testsFailed": 0,
+    "totalDurationMs": 40394,
+    "totalTokens": 4225,
+    "avgTokPerSec": 120.84107397324551,
+    "promptChars": 12362,
+    "promptTokensEst": 3091,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 46081,
+    "totalTokens": 2542,
+    "avgTokPerSec": 60.93046828700026,
+    "promptChars": 11412,
+    "promptTokensEst": 2853,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 41323,
+    "totalTokens": 2272,
+    "avgTokPerSec": 60.99406174164295,
+    "promptChars": 10884,
+    "promptTokensEst": 2721,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 14,
+    "testsPassed": 2,
+    "testsFailed": 12,
+    "totalDurationMs": 262591,
+    "totalTokens": 14129,
+    "avgTokPerSec": 57.91340837830759,
+    "promptChars": 12143,
+    "promptTokensEst": 3036,
+    "score": 29,
+    "stars": "★★☆☆☆",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 24007,
+    "totalTokens": 2137,
+    "avgTokPerSec": 101.05982103292858,
+    "promptChars": 10756,
+    "promptTokensEst": 2689,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 7,
+    "testsPassed": 6,
+    "testsFailed": 1,
+    "totalDurationMs": 68739,
+    "totalTokens": 6199,
+    "avgTokPerSec": 98.9825675198183,
+    "promptChars": 10313,
+    "promptTokensEst": 2578,
+    "score": 71,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 2
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 23472,
+    "totalTokens": 2427,
+    "avgTokPerSec": 120.85293828875076,
+    "promptChars": 11663,
+    "promptTokensEst": 2916,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 8,
+    "testsPassed": 8,
+    "testsFailed": 0,
+    "totalDurationMs": 25864,
+    "totalTokens": 2671,
+    "avgTokPerSec": 120.6883137195962,
+    "promptChars": 11148,
+    "promptTokensEst": 2787,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 12,
+    "testsPassed": 12,
+    "testsFailed": 0,
+    "totalDurationMs": 41074,
+    "totalTokens": 4275,
+    "avgTokPerSec": 120.33351485161673,
+    "promptChars": 12664,
+    "promptTokensEst": 3166,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 40457,
+    "totalTokens": 2229,
+    "avgTokPerSec": 61.093615619948345,
+    "promptChars": 10905,
+    "promptTokensEst": 2726,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 1,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 77506,
+    "totalTokens": 4268,
+    "avgTokPerSec": 60.19655522627278,
+    "promptChars": 11135,
+    "promptTokensEst": 2784,
+    "score": 90,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 12,
+    "testsPassed": 12,
+    "testsFailed": 0,
+    "totalDurationMs": 74791,
+    "totalTokens": 3590,
+    "avgTokPerSec": 60.549298891176214,
+    "promptChars": 11653,
+    "promptTokensEst": 2913,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 26402,
+    "totalTokens": 2358,
+    "avgTokPerSec": 100.76936895480246,
+    "promptChars": 11243,
+    "promptTokensEst": 2811,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 20751,
+    "totalTokens": 1837,
+    "avgTokPerSec": 101.05480893032836,
+    "promptChars": 10553,
+    "promptTokensEst": 2638,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 3
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 22098,
+    "totalTokens": 2283,
+    "avgTokPerSec": 121.81254413612446,
+    "promptChars": 11503,
+    "promptTokensEst": 2876,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 2,
+    "testsTotal": 8,
+    "testsPassed": 8,
+    "testsFailed": 0,
+    "totalDurationMs": 65403,
+    "totalTokens": 6779,
+    "avgTokPerSec": 118.13288294758586,
+    "promptChars": 10939,
+    "promptTokensEst": 2735,
+    "score": 80,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 10,
+    "testsPassed": 10,
+    "testsFailed": 0,
+    "totalDurationMs": 36044,
+    "totalTokens": 3748,
+    "avgTokPerSec": 120.14822967005487,
+    "promptChars": 12639,
+    "promptTokensEst": 3160,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 38501,
+    "totalTokens": 2113,
+    "avgTokPerSec": 61.01814139430428,
+    "promptChars": 10929,
+    "promptTokensEst": 2732,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 8,
+    "testsPassed": 1,
+    "testsFailed": 7,
+    "totalDurationMs": 147057,
+    "totalTokens": 7799,
+    "avgTokPerSec": 56.209406465865904,
+    "promptChars": 11207,
+    "promptTokensEst": 2802,
+    "score": 28,
+    "stars": "★★☆☆☆",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 12,
+    "testsPassed": 12,
+    "testsFailed": 0,
+    "totalDurationMs": 227508,
+    "totalTokens": 12026,
+    "avgTokPerSec": 58.52888492610325,
+    "promptChars": 11809,
+    "promptTokensEst": 2952,
+    "score": 80,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 11,
+    "testsPassed": 11,
+    "testsFailed": 0,
+    "totalDurationMs": 131964,
+    "totalTokens": 11403,
+    "avgTokPerSec": 97.10963264920952,
+    "promptChars": 11786,
+    "promptTokensEst": 2947,
+    "score": 80,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 38820,
+    "totalTokens": 1826,
+    "avgTokPerSec": 101.07773707712924,
+    "promptChars": 10568,
+    "promptTokensEst": 2642,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 4
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 1,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 39797,
+    "totalTokens": 3776,
+    "avgTokPerSec": 120.91801837211113,
+    "promptChars": 11435,
+    "promptTokensEst": 2859,
+    "score": 90,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 9,
+    "testsPassed": 8,
+    "testsFailed": 1,
+    "totalDurationMs": 87836,
+    "totalTokens": 9343,
+    "avgTokPerSec": 119.28783662683314,
+    "promptChars": 10718,
+    "promptTokensEst": 2680,
+    "score": 73,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 10,
+    "testsPassed": 10,
+    "testsFailed": 0,
+    "totalDurationMs": 36644,
+    "totalTokens": 3897,
+    "avgTokPerSec": 122.28607796191666,
+    "promptChars": 12598,
+    "promptTokensEst": 3150,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 1,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 127532,
+    "totalTokens": 3919,
+    "avgTokPerSec": 34.13133325491828,
+    "promptChars": 11352,
+    "promptTokensEst": 2838,
+    "score": 90,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 8,
+    "testsPassed": 6,
+    "testsFailed": 2,
+    "totalDurationMs": 217365,
+    "totalTokens": 7764,
+    "avgTokPerSec": 38.67613170588518,
+    "promptChars": 10834,
+    "promptTokensEst": 2709,
+    "score": 65,
+    "stars": "★★★☆☆",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:14b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 14,
+    "testsPassed": 7,
+    "testsFailed": 7,
+    "totalDurationMs": 248311,
+    "totalTokens": 13443,
+    "avgTokPerSec": 58.05680015263308,
+    "promptChars": 12219,
+    "promptTokensEst": 3055,
+    "score": 50,
+    "stars": "★★★☆☆",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 38326,
+    "totalTokens": 2079,
+    "avgTokPerSec": 100.89778087504016,
+    "promptChars": 10908,
+    "promptTokensEst": 2727,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 60823,
+    "totalTokens": 1772,
+    "avgTokPerSec": 96.76383996716295,
+    "promptChars": 10378,
+    "promptTokensEst": 2595,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 11,
+    "testsPassed": 11,
+    "testsFailed": 0,
+    "totalDurationMs": 81654,
+    "totalTokens": 3458,
+    "avgTokPerSec": 95.65675360193613,
+    "promptChars": 11914,
+    "promptTokensEst": 2979,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 5
+  }
+]
--- a/kipina-codebench/results/2026-04-14T10-03.html
+++ b/kipina-codebench/results/2026-04-14T10-03.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T10-03.json
+++ b/kipina-codebench/results/2026-04-14T10-03.json
@@ -0,0 +1 @@
+[]
--- a/kipina-codebench/results/2026-04-14T10-31.html
+++ b/kipina-codebench/results/2026-04-14T10-31.html
--- a/kipina-codebench/results/2026-04-14T10-31.json
+++ b/kipina-codebench/results/2026-04-14T10-31.json
@@ -0,0 +1,317 @@
+[
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 97527,
+    "totalTokens": 2228,
+    "avgTokPerSec": 100.69171830800946,
+    "promptChars": 11566,
+    "promptTokensEst": 2892,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 39549,
+    "totalTokens": 1960,
+    "avgTokPerSec": 100.98265593129491,
+    "promptChars": 11073,
+    "promptTokensEst": 2768,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 131339,
+    "totalTokens": 11518,
+    "avgTokPerSec": 96.52358107464266,
+    "promptChars": 12388,
+    "promptTokensEst": 3097,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 20658,
+    "totalTokens": 1808,
+    "avgTokPerSec": 101.0081173861862,
+    "promptChars": 11057,
+    "promptTokensEst": 2764,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 1,
+    "fixRounds": 5,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 320031,
+    "totalTokens": 11985,
+    "avgTokPerSec": 54.915025374575386,
+    "promptChars": 12517,
+    "promptTokensEst": 3129,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 7,
+    "testsPassed": 7,
+    "testsFailed": 0,
+    "totalDurationMs": 28654,
+    "totalTokens": 1877,
+    "avgTokPerSec": 100.70920643946336,
+    "promptChars": 10747,
+    "promptTokensEst": 2687,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 1,
+    "testsTotal": 12,
+    "testsPassed": 12,
+    "testsFailed": 0,
+    "totalDurationMs": 67943,
+    "totalTokens": 6002,
+    "avgTokPerSec": 98.29436788902672,
+    "promptChars": 12389,
+    "promptTokensEst": 3097,
+    "score": 90,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 20203,
+    "totalTokens": 1774,
+    "avgTokPerSec": 100.9066297884274,
+    "promptChars": 10905,
+    "promptTokensEst": 2726,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 13,
+    "testsPassed": 12,
+    "testsFailed": 1,
+    "totalDurationMs": 148491,
+    "totalTokens": 12747,
+    "avgTokPerSec": 95.18237885727869,
+    "promptChars": 12476,
+    "promptTokensEst": 3119,
+    "score": 75,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "todo",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 6,
+    "testsPassed": 6,
+    "testsFailed": 0,
+    "totalDurationMs": 23830,
+    "totalTokens": 2102,
+    "avgTokPerSec": 100.641489789061,
+    "promptChars": 11404,
+    "promptTokensEst": 2851,
+    "score": 100,
+    "stars": "★★★★★",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "users",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 8,
+    "testsPassed": 6,
+    "testsFailed": 2,
+    "totalDurationMs": 122453,
+    "totalTokens": 7285,
+    "avgTokPerSec": 94.12482830400619,
+    "promptChars": 11400,
+    "promptTokensEst": 2850,
+    "score": 65,
+    "stars": "★★★☆☆",
+    "error": null,
+    "round": 5
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 11,
+    "testsPassed": 10,
+    "testsFailed": 1,
+    "totalDurationMs": 147125,
+    "totalTokens": 9893,
+    "avgTokPerSec": 97.37021605085566,
+    "promptChars": 12455,
+    "promptTokensEst": 3114,
+    "score": 75,
+    "stars": "★★★★☆",
+    "error": null,
+    "round": 5
+  }
+]
--- a/kipina-codebench/results/2026-04-14T10-59.html
+++ b/kipina-codebench/results/2026-04-14T10-59.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":1,"testsTotal":11,"testsPassed":11,"testsFailed":0,"totalDurationMs":64124,"totalTokens":5689,"avgTokPerSec":98.61378134916481,"promptChars":12098,"promptTokensEst":3025,"score":90,"stars":"★★★★★","error":null,"profile":"small","promptName":"code-small","round":1},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":126014,"totalTokens":11162,"avgTokPerSec":97.09858655726343,"promptChars":12101,"promptTokensEst":3025,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"small","promptName":"code-small","round":2},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":3}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T10-59.json
+++ b/kipina-codebench/results/2026-04-14T10-59.json
@@ -0,0 +1,69 @@
+[
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 1,
+    "testsTotal": 11,
+    "testsPassed": 11,
+    "testsFailed": 0,
+    "totalDurationMs": 64124,
+    "totalTokens": 5689,
+    "avgTokPerSec": 98.61378134916481,
+    "promptChars": 12098,
+    "promptTokensEst": 3025,
+    "score": 90,
+    "stars": "★★★★★",
+    "error": null,
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 126014,
+    "totalTokens": 11162,
+    "avgTokPerSec": 97.09858655726343,
+    "promptChars": 12101,
+    "promptTokensEst": 3025,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 3
+  }
+]
--- a/kipina-codebench/results/2026-04-14T11-06.html
+++ b/kipina-codebench/results/2026-04-14T11-06.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":12,"testsPassed":10,"testsFailed":2,"totalDurationMs":139308,"totalTokens":11782,"avgTokPerSec":96.85039238572556,"promptChars":11148,"promptTokensEst":2787,"score":70,"stars":"★★★★☆","error":null,"profile":"small","promptName":"code-small","round":1},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":132306,"totalTokens":11671,"avgTokPerSec":96.88921767777383,"promptChars":11267,"promptTokensEst":2817,"score":20,"stars":"★☆☆☆☆","error":"Syntaksivirhe","profile":"small","promptName":"code-small","round":2},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":12,"testsPassed":11,"testsFailed":1,"totalDurationMs":126092,"totalTokens":11132,"avgTokPerSec":96.98598556369416,"promptChars":11292,"promptTokensEst":2823,"score":75,"stars":"★★★★☆","error":null,"profile":"small","promptName":"code-small","round":3}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T11-06.json
+++ b/kipina-codebench/results/2026-04-14T11-06.json
@@ -0,0 +1,71 @@
+[
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 12,
+    "testsPassed": 10,
+    "testsFailed": 2,
+    "totalDurationMs": 139308,
+    "totalTokens": 11782,
+    "avgTokPerSec": 96.85039238572556,
+    "promptChars": 11148,
+    "promptTokensEst": 2787,
+    "score": 70,
+    "stars": "★★★★☆",
+    "error": null,
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 132306,
+    "totalTokens": 11671,
+    "avgTokPerSec": 96.88921767777383,
+    "promptChars": 11267,
+    "promptTokensEst": 2817,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": "Syntaksivirhe",
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 12,
+    "testsPassed": 11,
+    "testsFailed": 1,
+    "totalDurationMs": 126092,
+    "totalTokens": 11132,
+    "avgTokPerSec": 96.98598556369416,
+    "promptChars": 11292,
+    "promptTokensEst": 2823,
+    "score": 75,
+    "stars": "★★★★☆",
+    "error": null,
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 3
+  }
+]
--- a/kipina-codebench/results/2026-04-14T11-15.html
+++ b/kipina-codebench/results/2026-04-14T11-15.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":3,"testsTotal":11,"testsPassed":9,"testsFailed":2,"totalDurationMs":75178,"totalTokens":9916,"avgTokPerSec":142.94675043471062,"promptChars":10516,"promptTokensEst":2629,"score":69,"stars":"★★★☆☆","error":null,"profile":"small","promptName":"code-small","round":1},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":1,"fixRounds":5,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":98787,"totalTokens":12904,"avgTokPerSec":141.16873850064812,"promptChars":11810,"promptTokensEst":2953,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"small","promptName":"code-small","round":2},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":81763,"totalTokens":10277,"avgTokPerSec":134.82946940948588,"promptChars":11534,"promptTokensEst":2884,"score":20,"stars":"★☆☆☆☆","error":"Syntaksivirhe","profile":"small","promptName":"code-small","round":3},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":3,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":88517,"totalTokens":11280,"avgTokPerSec":136.63597159351744,"promptChars":10568,"promptTokensEst":2642,"score":20,"stars":"★☆☆☆☆","error":"Syntaksivirhe","profile":"small","promptName":"code-small","round":4},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":12,"testsPassed":9,"testsFailed":3,"totalDurationMs":87817,"totalTokens":11171,"avgTokPerSec":136.1538785139482,"promptChars":11627,"promptTokensEst":2907,"score":65,"stars":"★★★☆☆","error":null,"profile":"small","promptName":"code-small","round":5}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T11-15.json
+++ b/kipina-codebench/results/2026-04-14T11-15.json
@@ -0,0 +1,117 @@
+[
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 11,
+    "testsPassed": 9,
+    "testsFailed": 2,
+    "totalDurationMs": 75178,
+    "totalTokens": 9916,
+    "avgTokPerSec": 142.94675043471062,
+    "promptChars": 10516,
+    "promptTokensEst": 2629,
+    "score": 69,
+    "stars": "★★★☆☆",
+    "error": null,
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 1,
+    "fixRounds": 5,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 98787,
+    "totalTokens": 12904,
+    "avgTokPerSec": 141.16873850064812,
+    "promptChars": 11810,
+    "promptTokensEst": 2953,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 81763,
+    "totalTokens": 10277,
+    "avgTokPerSec": 134.82946940948588,
+    "promptChars": 11534,
+    "promptTokensEst": 2884,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": "Syntaksivirhe",
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 88517,
+    "totalTokens": 11280,
+    "avgTokPerSec": 136.63597159351744,
+    "promptChars": 10568,
+    "promptTokensEst": 2642,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": "Syntaksivirhe",
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 12,
+    "testsPassed": 9,
+    "testsFailed": 3,
+    "totalDurationMs": 87817,
+    "totalTokens": 11171,
+    "avgTokPerSec": 136.1538785139482,
+    "promptChars": 11627,
+    "promptTokensEst": 2907,
+    "score": 65,
+    "stars": "★★★☆☆",
+    "error": null,
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 5
+  }
+]
--- a/kipina-codebench/results/2026-04-14T11-54.html
+++ b/kipina-codebench/results/2026-04-14T11-54.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":79193,"totalTokens":10304,"avgTokPerSec":141.2083113764173,"promptChars":12199,"promptTokensEst":3050,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"small","promptName":"code-small","round":1},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":10,"testsPassed":6,"testsFailed":4,"totalDurationMs":66764,"totalTokens":8896,"avgTokPerSec":142.57944640796882,"promptChars":12391,"promptTokensEst":3098,"score":56,"stars":"★★★☆☆","error":null,"profile":"small","promptName":"code-small","round":2},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":76403,"totalTokens":9962,"avgTokPerSec":137.0023398819064,"promptChars":12432,"promptTokensEst":3108,"score":20,"stars":"★☆☆☆☆","error":"Syntaksivirhe","profile":"small","promptName":"code-small","round":3},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":3,"testsTotal":13,"testsPassed":7,"testsFailed":6,"totalDurationMs":81345,"totalTokens":10535,"avgTokPerSec":139.42076339875726,"promptChars":11419,"promptTokensEst":2855,"score":52,"stars":"★★★☆☆","error":null,"profile":"small","promptName":"code-small","round":4},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":3,"testsTotal":12,"testsPassed":11,"testsFailed":1,"totalDurationMs":72723,"totalTokens":9567,"avgTokPerSec":141.2709378394512,"promptChars":11416,"promptTokensEst":2854,"score":75,"stars":"★★★★☆","error":null,"profile":"small","promptName":"code-small","round":5}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T11-54.json
+++ b/kipina-codebench/results/2026-04-14T11-54.json
@@ -0,0 +1,117 @@
+[
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 79193,
+    "totalTokens": 10304,
+    "avgTokPerSec": 141.2083113764173,
+    "promptChars": 12199,
+    "promptTokensEst": 3050,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 10,
+    "testsPassed": 6,
+    "testsFailed": 4,
+    "totalDurationMs": 66764,
+    "totalTokens": 8896,
+    "avgTokPerSec": 142.57944640796882,
+    "promptChars": 12391,
+    "promptTokensEst": 3098,
+    "score": 56,
+    "stars": "★★★☆☆",
+    "error": null,
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 76403,
+    "totalTokens": 9962,
+    "avgTokPerSec": 137.0023398819064,
+    "promptChars": 12432,
+    "promptTokensEst": 3108,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": "Syntaksivirhe",
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 13,
+    "testsPassed": 7,
+    "testsFailed": 6,
+    "totalDurationMs": 81345,
+    "totalTokens": 10535,
+    "avgTokPerSec": 139.42076339875726,
+    "promptChars": 11419,
+    "promptTokensEst": 2855,
+    "score": 52,
+    "stars": "★★★☆☆",
+    "error": null,
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 12,
+    "testsPassed": 11,
+    "testsFailed": 1,
+    "totalDurationMs": 72723,
+    "totalTokens": 9567,
+    "avgTokPerSec": 141.2709378394512,
+    "promptChars": 11416,
+    "promptTokensEst": 2854,
+    "score": 75,
+    "stars": "★★★★☆",
+    "error": null,
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 5
+  }
+]
--- a/kipina-codebench/results/2026-04-14T11-55.html
+++ b/kipina-codebench/results/2026-04-14T11-55.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":1},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":3,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":56798,"totalTokens":5105,"avgTokPerSec":99.4097006568848,"promptChars":11326,"promptTokensEst":2832,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"small","promptName":"code-small","round":2},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":114297,"totalTokens":10163,"avgTokPerSec":97.19131591932717,"promptChars":12182,"promptTokensEst":3046,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"small","promptName":"code-small","round":3},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":4},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":12,"testsPassed":11,"testsFailed":1,"totalDurationMs":112008,"totalTokens":9892,"avgTokPerSec":97.0586619009377,"promptChars":12406,"promptTokensEst":3102,"score":75,"stars":"★★★★☆","error":null,"profile":"small","promptName":"code-small","round":5}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T11-55.json
+++ b/kipina-codebench/results/2026-04-14T11-55.json
@@ -0,0 +1,113 @@
+[
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 56798,
+    "totalTokens": 5105,
+    "avgTokPerSec": 99.4097006568848,
+    "promptChars": 11326,
+    "promptTokensEst": 2832,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 114297,
+    "totalTokens": 10163,
+    "avgTokPerSec": 97.19131591932717,
+    "promptChars": 12182,
+    "promptTokensEst": 3046,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 12,
+    "testsPassed": 11,
+    "testsFailed": 1,
+    "totalDurationMs": 112008,
+    "totalTokens": 9892,
+    "avgTokPerSec": 97.0586619009377,
+    "promptChars": 12406,
+    "promptTokensEst": 3102,
+    "score": 75,
+    "stars": "★★★★☆",
+    "error": null,
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 5
+  }
+]
--- a/kipina-codebench/results/2026-04-14T12-01.html
+++ b/kipina-codebench/results/2026-04-14T12-01.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":1},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":11,"testsPassed":11,"testsFailed":0,"totalDurationMs":143640,"totalTokens":12611,"avgTokPerSec":96.28061629672216,"promptChars":12125,"promptTokensEst":3031,"score":80,"stars":"★★★★☆","error":null,"profile":"small","promptName":"code-small","round":2},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":2,"testsTotal":12,"testsPassed":12,"testsFailed":0,"totalDurationMs":116061,"totalTokens":10181,"avgTokPerSec":96.63321228455318,"promptChars":12435,"promptTokensEst":3109,"score":80,"stars":"★★★★☆","error":null,"profile":"small","promptName":"code-small","round":3},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":4},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":2,"testsTotal":11,"testsPassed":11,"testsFailed":0,"totalDurationMs":113792,"totalTokens":10022,"avgTokPerSec":96.96815077469971,"promptChars":12260,"promptTokensEst":3065,"score":80,"stars":"★★★★☆","error":null,"profile":"small","promptName":"code-small","round":5}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T12-01.json
+++ b/kipina-codebench/results/2026-04-14T12-01.json
@@ -0,0 +1,113 @@
+[
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 11,
+    "testsPassed": 11,
+    "testsFailed": 0,
+    "totalDurationMs": 143640,
+    "totalTokens": 12611,
+    "avgTokPerSec": 96.28061629672216,
+    "promptChars": 12125,
+    "promptTokensEst": 3031,
+    "score": 80,
+    "stars": "★★★★☆",
+    "error": null,
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 2,
+    "testsTotal": 12,
+    "testsPassed": 12,
+    "testsFailed": 0,
+    "totalDurationMs": 116061,
+    "totalTokens": 10181,
+    "avgTokPerSec": 96.63321228455318,
+    "promptChars": 12435,
+    "promptTokensEst": 3109,
+    "score": 80,
+    "stars": "★★★★☆",
+    "error": null,
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 2,
+    "testsTotal": 11,
+    "testsPassed": 11,
+    "testsFailed": 0,
+    "totalDurationMs": 113792,
+    "totalTokens": 10022,
+    "avgTokPerSec": 96.96815077469971,
+    "promptChars": 12260,
+    "promptTokensEst": 3065,
+    "score": 80,
+    "stars": "★★★★☆",
+    "error": null,
+    "profile": "small",
+    "promptName": "code-small",
+    "round": 5
+  }
+]
--- a/kipina-codebench/results/2026-04-14T13-11.html
+++ b/kipina-codebench/results/2026-04-14T13-11.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":1,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":10508,"promptTokensEst":2627,"score":0,"stars":"","error":"Puuttuvat: Cargo.toml, src/models.rs, src/handlers.rs, src/lib.rs, src/main.rs, tests/api_test.rs","round":1},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":2},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":3},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":4},{"model":"qwen3:8b","scenario":"blog","reqOk":true,"specOk":false,"specEntities":0,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":0,"promptTokensEst":0,"score":0,"stars":"","error":"JSON-speksi epäonnistui","round":5}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T13-11.json
+++ b/kipina-codebench/results/2026-04-14T13-11.json
@@ -0,0 +1,107 @@
+[
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 1,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 10508,
+    "promptTokensEst": 2627,
+    "score": 0,
+    "stars": "",
+    "error": "Puuttuvat: Cargo.toml, src/models.rs, src/handlers.rs, src/lib.rs, src/main.rs, tests/api_test.rs",
+    "round": 1
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 2
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 3
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 4
+  },
+  {
+    "model": "qwen3:8b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": false,
+    "specEntities": 0,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 0,
+    "promptTokensEst": 0,
+    "score": 0,
+    "stars": "",
+    "error": "JSON-speksi epäonnistui",
+    "round": 5
+  }
+]
--- a/kipina-codebench/results/2026-04-14T13-12.html
+++ b/kipina-codebench/results/2026-04-14T13-12.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":3,"testsPassed":0,"testsFailed":3,"totalDurationMs":217110,"totalTokens":21602,"avgTokPerSec":114.70956637458333,"promptChars":12612,"promptTokensEst":3153,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":1},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":3,"testsPassed":0,"testsFailed":3,"totalDurationMs":204772,"totalTokens":20717,"avgTokPerSec":114.45999021594592,"promptChars":12743,"promptTokensEst":3186,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":2},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":4,"testsPassed":0,"testsFailed":4,"totalDurationMs":180501,"totalTokens":18467,"avgTokPerSec":115.23583963958032,"promptChars":12392,"promptTokensEst":3098,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":3},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":25,"testsPassed":0,"testsFailed":25,"totalDurationMs":282681,"totalTokens":27665,"avgTokPerSec":111.29688837623901,"promptChars":12675,"promptTokensEst":3169,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":4},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":5,"testsPassed":0,"testsFailed":5,"totalDurationMs":171686,"totalTokens":17525,"avgTokPerSec":114.88288274375243,"promptChars":12618,"promptTokensEst":3155,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":5}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T13-12.json
+++ b/kipina-codebench/results/2026-04-14T13-12.json
@@ -0,0 +1,117 @@
+[
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 3,
+    "testsPassed": 0,
+    "testsFailed": 3,
+    "totalDurationMs": 217110,
+    "totalTokens": 21602,
+    "avgTokPerSec": 114.70956637458333,
+    "promptChars": 12612,
+    "promptTokensEst": 3153,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": null,
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 1
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 3,
+    "testsPassed": 0,
+    "testsFailed": 3,
+    "totalDurationMs": 204772,
+    "totalTokens": 20717,
+    "avgTokPerSec": 114.45999021594592,
+    "promptChars": 12743,
+    "promptTokensEst": 3186,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": null,
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 2
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 4,
+    "testsPassed": 0,
+    "testsFailed": 4,
+    "totalDurationMs": 180501,
+    "totalTokens": 18467,
+    "avgTokPerSec": 115.23583963958032,
+    "promptChars": 12392,
+    "promptTokensEst": 3098,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": null,
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 3
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 25,
+    "testsPassed": 0,
+    "testsFailed": 25,
+    "totalDurationMs": 282681,
+    "totalTokens": 27665,
+    "avgTokPerSec": 111.29688837623901,
+    "promptChars": 12675,
+    "promptTokensEst": 3169,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": null,
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 4
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 5,
+    "testsPassed": 0,
+    "testsFailed": 5,
+    "totalDurationMs": 171686,
+    "totalTokens": 17525,
+    "avgTokPerSec": 114.88288274375243,
+    "promptChars": 12618,
+    "promptTokensEst": 3155,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": null,
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 5
+  }
+]
--- a/kipina-codebench/results/2026-04-14T13-42.html
+++ b/kipina-codebench/results/2026-04-14T13-42.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":18,"testsPassed":0,"testsFailed":18,"totalDurationMs":208078,"totalTokens":20783,"avgTokPerSec":114.94478559756693,"promptChars":13278,"promptTokensEst":3320,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":1},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":13362,"promptTokensEst":3341,"score":0,"stars":"","error":"Puuttuvat: src/lib.rs, src/main.rs, tests/api_test.rs","round":2},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":9,"testsPassed":0,"testsFailed":9,"totalDurationMs":221174,"totalTokens":22354,"avgTokPerSec":114.09551344946065,"promptChars":13234,"promptTokensEst":3309,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":3},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":13317,"promptTokensEst":3329,"score":0,"stars":"","error":"Puuttuvat: src/lib.rs, src/main.rs, tests/api_test.rs","round":4},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":8795,"totalTokens":954,"avgTokPerSec":124.86009274372915,"promptChars":13335,"promptTokensEst":3334,"score":0,"stars":"☆☆☆☆☆","error":"fetch failed","profile":"large","promptName":"code-rs","round":5}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T13-42.json
+++ b/kipina-codebench/results/2026-04-14T13-42.json
@@ -0,0 +1,113 @@
+[
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 18,
+    "testsPassed": 0,
+    "testsFailed": 18,
+    "totalDurationMs": 208078,
+    "totalTokens": 20783,
+    "avgTokPerSec": 114.94478559756693,
+    "promptChars": 13278,
+    "promptTokensEst": 3320,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": null,
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 1
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 13362,
+    "promptTokensEst": 3341,
+    "score": 0,
+    "stars": "",
+    "error": "Puuttuvat: src/lib.rs, src/main.rs, tests/api_test.rs",
+    "round": 2
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 9,
+    "testsPassed": 0,
+    "testsFailed": 9,
+    "totalDurationMs": 221174,
+    "totalTokens": 22354,
+    "avgTokPerSec": 114.09551344946065,
+    "promptChars": 13234,
+    "promptTokensEst": 3309,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": null,
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 3
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 13317,
+    "promptTokensEst": 3329,
+    "score": 0,
+    "stars": "",
+    "error": "Puuttuvat: src/lib.rs, src/main.rs, tests/api_test.rs",
+    "round": 4
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 8795,
+    "totalTokens": 954,
+    "avgTokPerSec": 124.86009274372915,
+    "promptChars": 13335,
+    "promptTokensEst": 3334,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "fetch failed",
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 5
+  }
+]
--- a/kipina-codebench/results/2026-04-14T14-12.html
+++ b/kipina-codebench/results/2026-04-14T14-12.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":1,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":133173,"totalTokens":13174,"avgTokPerSec":117.52479437665707,"promptChars":14102,"promptTokensEst":3526,"score":30,"stars":"★★☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":1},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":5,"testsPassed":0,"testsFailed":5,"totalDurationMs":267561,"totalTokens":27021,"avgTokPerSec":113.5812238661422,"promptChars":14052,"promptTokensEst":3513,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":2},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":0,"totalTokens":0,"avgTokPerSec":0,"promptChars":13914,"promptTokensEst":3479,"score":0,"stars":"","error":"Puuttuvat: src/handlers.rs, src/lib.rs, src/main.rs, tests/api_test.rs","round":3},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":2,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":162271,"totalTokens":16343,"avgTokPerSec":115.53039090208604,"promptChars":14062,"promptTokensEst":3516,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":4},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":211367,"totalTokens":21183,"avgTokPerSec":113.22772767359652,"promptChars":14038,"promptTokensEst":3510,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":5}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T14-12.json
+++ b/kipina-codebench/results/2026-04-14T14-12.json
@@ -0,0 +1,115 @@
+[
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 1,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 133173,
+    "totalTokens": 13174,
+    "avgTokPerSec": 117.52479437665707,
+    "promptChars": 14102,
+    "promptTokensEst": 3526,
+    "score": 30,
+    "stars": "★★☆☆☆",
+    "error": null,
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 1
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 5,
+    "testsPassed": 0,
+    "testsFailed": 5,
+    "totalDurationMs": 267561,
+    "totalTokens": 27021,
+    "avgTokPerSec": 113.5812238661422,
+    "promptChars": 14052,
+    "promptTokensEst": 3513,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": null,
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 2
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 0,
+    "totalTokens": 0,
+    "avgTokPerSec": 0,
+    "promptChars": 13914,
+    "promptTokensEst": 3479,
+    "score": 0,
+    "stars": "",
+    "error": "Puuttuvat: src/handlers.rs, src/lib.rs, src/main.rs, tests/api_test.rs",
+    "round": 3
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 2,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 162271,
+    "totalTokens": 16343,
+    "avgTokPerSec": 115.53039090208604,
+    "promptChars": 14062,
+    "promptTokensEst": 3516,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": null,
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 4
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 211367,
+    "totalTokens": 21183,
+    "avgTokPerSec": 113.22772767359652,
+    "promptChars": 14038,
+    "promptTokensEst": 3510,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": null,
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 5
+  }
+]
--- a/kipina-codebench/results/2026-04-14T14-38.html
+++ b/kipina-codebench/results/2026-04-14T14-38.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":38807,"totalTokens":5667,"avgTokPerSec":183.83891911423427,"promptChars":21818,"promptTokensEst":5455,"score":40,"stars":"★★☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":1},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":178290,"totalTokens":26265,"avgTokPerSec":168.77786498646262,"promptChars":21840,"promptTokensEst":5460,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"large","promptName":"code-rs","round":2},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":151603,"totalTokens":22725,"avgTokPerSec":170.74115131582644,"promptChars":21750,"promptTokensEst":5438,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"large","promptName":"code-rs","round":3},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":0,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":41059,"totalTokens":6288,"avgTokPerSec":183.76827829344424,"promptChars":21848,"promptTokensEst":5462,"score":40,"stars":"★★☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":4},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":3,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":187666,"totalTokens":27278,"avgTokPerSec":166.24197655672018,"promptChars":21694,"promptTokensEst":5424,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"large","promptName":"code-rs","round":5}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T14-38.json
+++ b/kipina-codebench/results/2026-04-14T14-38.json
@@ -0,0 +1,117 @@
+[
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 38807,
+    "totalTokens": 5667,
+    "avgTokPerSec": 183.83891911423427,
+    "promptChars": 21818,
+    "promptTokensEst": 5455,
+    "score": 40,
+    "stars": "★★☆☆☆",
+    "error": null,
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 1
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 178290,
+    "totalTokens": 26265,
+    "avgTokPerSec": 168.77786498646262,
+    "promptChars": 21840,
+    "promptTokensEst": 5460,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 2
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 151603,
+    "totalTokens": 22725,
+    "avgTokPerSec": 170.74115131582644,
+    "promptChars": 21750,
+    "promptTokensEst": 5438,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 3
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 0,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 41059,
+    "totalTokens": 6288,
+    "avgTokPerSec": 183.76827829344424,
+    "promptChars": 21848,
+    "promptTokensEst": 5462,
+    "score": 40,
+    "stars": "★★☆☆☆",
+    "error": null,
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 4
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 3,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 187666,
+    "totalTokens": 27278,
+    "avgTokPerSec": 166.24197655672018,
+    "promptChars": 21694,
+    "promptTokensEst": 5424,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 5
+  }
+]
--- a/kipina-codebench/results/2026-04-14T14-52.html
+++ b/kipina-codebench/results/2026-04-14T14-52.html
@@ -0,0 +1,183 @@
+<!DOCTYPE html>
+<html lang="fi">
+<head>
+<meta charset="UTF-8">
+<meta name="viewport" content="width=device-width, initial-scale=1.0">
+<title>Kipina Model Benchmark</title>
+<style>
+  :root { --bg: #0d1117; --card: #161b22; --border: #30363d; --text: #e6edf3; --dim: #8b949e; --green: #3fb950; --yellow: #d29922; --red: #f85149; --blue: #58a6ff; }
+  * { box-sizing: border-box; margin: 0; padding: 0; }
+  body { font-family: -apple-system, 'Segoe UI', Helvetica, Arial, sans-serif; background: var(--bg); color: var(--text); padding: 2rem; max-width: 1400px; margin: 0 auto; }
+  h1 { font-size: 1.5rem; margin-bottom: 0.5rem; }
+  .meta { color: var(--dim); font-size: 0.85rem; margin-bottom: 2rem; }
+  .cards { display: grid; grid-template-columns: repeat(auto-fit, minmax(180px, 1fr)); gap: 1rem; margin-bottom: 2rem; }
+  .card { background: var(--card); border: 1px solid var(--border); border-radius: 8px; padding: 1rem; }
+  .card .label { color: var(--dim); font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; }
+  .card .value { font-size: 1.8rem; font-weight: 600; margin-top: 0.25rem; }
+  .card .sub { color: var(--dim); font-size: 0.8rem; margin-top: 0.25rem; }
+  table { width: 100%; border-collapse: collapse; background: var(--card); border: 1px solid var(--border); border-radius: 8px; overflow: hidden; margin-bottom: 2rem; }
+  th { background: #1c2128; text-align: left; padding: 0.6rem 0.8rem; font-size: 0.75rem; text-transform: uppercase; letter-spacing: 0.05em; color: var(--dim); cursor: pointer; user-select: none; white-space: nowrap; }
+  th:hover { color: var(--text); }
+  th.sorted-asc::after { content: ' ▲'; }
+  th.sorted-desc::after { content: ' ▼'; }
+  td { padding: 0.5rem 0.8rem; border-top: 1px solid var(--border); font-size: 0.85rem; white-space: nowrap; }
+  tr:hover td { background: #1c2128; }
+  .pass { color: var(--green); }
+  .partial { color: var(--yellow); }
+  .fail { color: var(--red); }
+  .stars { letter-spacing: 1px; }
+  .bar { display: inline-block; height: 8px; border-radius: 4px; vertical-align: middle; }
+  .bar-bg { background: var(--border); }
+  .bar-fill { background: var(--green); }
+  .bar-partial { background: var(--yellow); }
+  .model-name { font-weight: 600; }
+  h2 { font-size: 1.1rem; margin-bottom: 1rem; color: var(--dim); }
+  .summary-table th:first-child, .summary-table td:first-child { min-width: 200px; }
+</style>
+</head>
+<body>
+
+<h1>Kipina Model Benchmark</h1>
+<div class="meta" id="meta"></div>
+
+<div class="cards" id="cards"></div>
+
+<h2>Mallikohtainen yhteenveto</h2>
+<table class="summary-table" id="summary-table"><thead></thead><tbody></tbody></table>
+
+<h2>Kaikki tulokset</h2>
+<table id="results-table"><thead></thead><tbody></tbody></table>
+
+<script>
+const RAW = [{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":4,"testsTotal":1,"testsPassed":0,"testsFailed":1,"totalDurationMs":231122,"totalTokens":22952,"avgTokPerSec":113.75113825466987,"promptChars":17604,"promptTokensEst":4401,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":1},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":5,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":260314,"totalTokens":26144,"avgTokPerSec":113.40388181735229,"promptChars":17539,"promptTokensEst":4385,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"large","promptName":"code-rs","round":2},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":4,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":227228,"totalTokens":22381,"avgTokPerSec":113.5362722539456,"promptChars":17630,"promptTokensEst":4408,"score":0,"stars":"☆☆☆☆☆","error":"Testit kaatuivat","profile":"large","promptName":"code-rs","round":3},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":1,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":102052,"totalTokens":9984,"avgTokPerSec":117.77973450501808,"promptChars":17571,"promptTokensEst":4393,"score":30,"stars":"★★☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":4},{"model":"qwen3-coder:30b","scenario":"blog","reqOk":true,"specOk":true,"specEntities":2,"validationIssues":0,"fixRounds":2,"testsTotal":0,"testsPassed":0,"testsFailed":0,"totalDurationMs":146321,"totalTokens":14445,"avgTokPerSec":115.61186488022163,"promptChars":17589,"promptTokensEst":4397,"score":20,"stars":"★☆☆☆☆","error":null,"profile":"large","promptName":"code-rs","round":5}];
+
+const starsFor = s => s >= 90 ? '★★★★★' : s >= 70 ? '★★★★☆' : s >= 50 ? '★★★☆☆' : s >= 25 ? '★★☆☆☆' : s > 0 ? '★☆☆☆☆' : '☆☆☆☆☆';
+function calcScore(r) {
+  if (r.error && r.testsTotal === 0) return 0;
+  let s = 0;
+  if (r.specOk) s += 10;
+  if (!r.error || r.testsTotal > 0) s += 10;
+  if (r.testsTotal > 0) s += Math.round((r.testsPassed / r.testsTotal) * 60);
+  s += Math.max(0, 20 - (r.fixRounds || 0) * 10);
+  return Math.min(100, s);
+}
+// Laske pisteet jos puuttuvat
+const DATA = RAW.map(r => {
+  if (r.score == null) r.score = calcScore(r);
+  if (!r.stars) r.stars = starsFor(r.score);
+  if (!r.promptTokensEst) r.promptTokensEst = r.promptChars ? Math.round(r.promptChars / 4) : 0;
+  return r;
+});
+const cls = r => (!r.error && r.testsPassed === r.testsTotal && r.testsTotal > 0) ? 'pass' : (r.testsTotal > 0 && r.testsPassed > 0) ? 'partial' : 'fail';
+const pctBar = (passed, total, w=80) => {
+  if (total === 0) return '-';
+  const pct = passed/total*100;
+  const c = pct === 100 ? 'bar-fill' : 'bar-partial';
+  return `<span class="bar bar-bg" style="width:${w}px"><span class="bar ${c}" style="width:${Math.round(pct/100*w)}px"></span></span> ${passed}/${total}`;
+};
+
+// Meta
+const totalTime = DATA.reduce((s,r) => s + r.totalDurationMs, 0);
+document.getElementById('meta').textContent = `${new Date().toLocaleDateString('fi-FI')} — ${DATA.length} ajoa — ${(totalTime/1000/60).toFixed(1)} min`;
+
+// Cards
+const models = [...new Set(DATA.map(r => r.model))];
+const scenarios = [...new Set(DATA.map(r => r.scenario))];
+const avgScore = DATA.length ? Math.round(DATA.reduce((s,r) => s + r.score, 0) / DATA.length) : 0;
+const totalPassed = DATA.reduce((s,r) => s + r.testsPassed, 0);
+const totalTests = DATA.reduce((s,r) => s + r.testsTotal, 0);
+const passRate = totalTests ? Math.round(totalPassed/totalTests*100) : 0;
+const bestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, avg: Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length) };
+}).sort((a,b) => b.avg - a.avg)[0];
+const fastestModel = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  return { model: m, speed: Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length) };
+}).sort((a,b) => b.speed - a.speed)[0];
+
+document.getElementById('cards').innerHTML = `
+  <div class="card"><div class="label">Keskiarvo</div><div class="value">${starsFor(avgScore)}</div><div class="sub">${avgScore} pistetta</div></div>
+  <div class="card"><div class="label">Testien läpäisy</div><div class="value">${passRate}%</div><div class="sub">${totalPassed}/${totalTests} testiä</div></div>
+  <div class="card"><div class="label">Paras malli</div><div class="value" style="font-size:1.2rem">${bestModel?.model || '-'}</div><div class="sub">${bestModel?.avg || 0}p</div></div>
+  <div class="card"><div class="label">Nopein</div><div class="value" style="font-size:1.2rem">${fastestModel?.model || '-'}</div><div class="sub">${fastestModel?.speed || 0} tok/s</div></div>
+  <div class="card"><div class="label">Malleja</div><div class="value">${models.length}</div><div class="sub">${scenarios.length} skenaariota</div></div>
+  <div class="card"><div class="label">Kokonaisaika</div><div class="value">${(totalTime/1000/60).toFixed(1)}</div><div class="sub">minuuttia</div></div>
+`;
+
+// Summary table
+const sumHead = document.querySelector('#summary-table thead');
+const sumBody = document.querySelector('#summary-table tbody');
+sumHead.innerHTML = '<tr><th>Malli</th>' + scenarios.map(s => `<th>${s}</th>`).join('') + '<th>Yht.</th><th>Out tok</th><th>Aika</th><th>tok/s</th><th>Pisteet</th></tr>';
+
+const modelRows = models.map(m => {
+  const mrs = DATA.filter(r => r.model === m);
+  const tp = mrs.reduce((s,r) => s + r.testsPassed, 0);
+  const tt = mrs.reduce((s,r) => s + r.testsTotal, 0);
+  const tok = mrs.reduce((s,r) => s + r.totalTokens, 0);
+  const time = mrs.reduce((s,r) => s + r.totalDurationMs, 0);
+  const speed = Math.round(mrs.reduce((s,r) => s + r.avgTokPerSec, 0) / mrs.length);
+  const avg = Math.round(mrs.reduce((s,r) => s + r.score, 0) / mrs.length);
+  const scenCols = scenarios.map(s => {
+    const r = mrs.find(r => r.scenario === s);
+    if (!r) return '<td>-</td>';
+    return `<td class="${cls(r)}">${pctBar(r.testsPassed, r.testsTotal, 60)} <span style="color:var(--dim)">${(r.totalDurationMs/1000).toFixed(0)}s</span></td>`;
+  }).join('');
+  return { avg, html: `<tr><td class="model-name">${m}</td>${scenCols}<td>${pctBar(tp, tt)}</td><td>${(tok/1000).toFixed(1)}K</td><td>${(time/1000).toFixed(0)}s</td><td>${speed}</td><td><span class="stars">${starsFor(avg)}</span> ${avg}p</td></tr>` };
+}).sort((a,b) => b.avg - a.avg);
+sumBody.innerHTML = modelRows.map(r => r.html).join('');
+
+// Results table
+const resHead = document.querySelector('#results-table thead');
+const resBody = document.querySelector('#results-table tbody');
+const resCols = ['Malli','Skenaario','Speksi','Testit','Korjaus','Ctx','Out tok','Aika','tok/s','Pisteet'];
+resHead.innerHTML = '<tr>' + resCols.map((c,i) => `<th data-col="${i}">${c}</th>`).join('') + '</tr>';
+
+let sortCol = 9, sortAsc = false;
+function renderResults() {
+  const sorted = [...DATA].sort((a,b) => {
+    const vals = [
+      [a.model, b.model],
+      [a.scenario, b.scenario],
+      [a.specEntities, b.specEntities],
+      [a.testsPassed/Math.max(a.testsTotal,1), b.testsPassed/Math.max(b.testsTotal,1)],
+      [a.fixRounds, b.fixRounds],
+      [a.promptTokensEst, b.promptTokensEst],
+      [a.totalTokens, b.totalTokens],
+      [a.totalDurationMs, b.totalDurationMs],
+      [a.avgTokPerSec, b.avgTokPerSec],
+      [a.score, b.score],
+    ][sortCol];
+    const cmp = typeof vals[0] === 'string' ? vals[0].localeCompare(vals[1]) : vals[0] - vals[1];
+    return sortAsc ? cmp : -cmp;
+  });
+  resBody.innerHTML = sorted.map(r => {
+    const c = cls(r);
+    return `<tr>
+      <td class="model-name">${r.model}</td>
+      <td>${r.scenario}</td>
+      <td>${r.specOk ? `✓ ${r.specEntities}e` : '<span class="fail">✗</span>'}</td>
+      <td class="${c}">${pctBar(r.testsPassed, r.testsTotal)}</td>
+      <td>${r.fixRounds > 0 ? r.fixRounds + '×' : '-'}</td>
+      <td>${r.promptTokensEst > 0 ? '~'+(r.promptTokensEst/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${r.totalTokens > 0 ? (r.totalTokens/1000).toFixed(1)+'K' : '-'}</td>
+      <td>${(r.totalDurationMs/1000).toFixed(0)}s</td>
+      <td>${r.avgTokPerSec.toFixed(0)}</td>
+      <td><span class="stars">${r.stars}</span> ${r.score}p</td>
+    </tr>`;
+  }).join('');
+  document.querySelectorAll('#results-table th').forEach((th,i) => {
+    th.className = i === sortCol ? (sortAsc ? 'sorted-asc' : 'sorted-desc') : '';
+  });
+}
+document.querySelector('#results-table thead').addEventListener('click', e => {
+  const col = parseInt(e.target.dataset.col);
+  if (isNaN(col)) return;
+  if (sortCol === col) sortAsc = !sortAsc;
+  else { sortCol = col; sortAsc = false; }
+  renderResults();
+});
+renderResults();
+</script>
+</body>
+</html>
--- a/kipina-codebench/results/2026-04-14T14-52.json
+++ b/kipina-codebench/results/2026-04-14T14-52.json
@@ -0,0 +1,117 @@
+[
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 4,
+    "testsTotal": 1,
+    "testsPassed": 0,
+    "testsFailed": 1,
+    "totalDurationMs": 231122,
+    "totalTokens": 22952,
+    "avgTokPerSec": 113.75113825466987,
+    "promptChars": 17604,
+    "promptTokensEst": 4401,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": null,
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 1
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 5,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 260314,
+    "totalTokens": 26144,
+    "avgTokPerSec": 113.40388181735229,
+    "promptChars": 17539,
+    "promptTokensEst": 4385,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 2
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 4,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 227228,
+    "totalTokens": 22381,
+    "avgTokPerSec": 113.5362722539456,
+    "promptChars": 17630,
+    "promptTokensEst": 4408,
+    "score": 0,
+    "stars": "☆☆☆☆☆",
+    "error": "Testit kaatuivat",
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 3
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 1,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 102052,
+    "totalTokens": 9984,
+    "avgTokPerSec": 117.77973450501808,
+    "promptChars": 17571,
+    "promptTokensEst": 4393,
+    "score": 30,
+    "stars": "★★☆☆☆",
+    "error": null,
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 4
+  },
+  {
+    "model": "qwen3-coder:30b",
+    "scenario": "blog",
+    "reqOk": true,
+    "specOk": true,
+    "specEntities": 2,
+    "validationIssues": 0,
+    "fixRounds": 2,
+    "testsTotal": 0,
+    "testsPassed": 0,
+    "testsFailed": 0,
+    "totalDurationMs": 146321,
+    "totalTokens": 14445,
+    "avgTokPerSec": 115.61186488022163,
+    "promptChars": 17589,
+    "promptTokensEst": 4397,
+    "score": 20,
+    "stars": "★☆☆☆☆",
+    "error": null,
+    "profile": "large",
+    "promptName": "code-rs",
+    "round": 5
+  }
+]
--- a/Show More
+++ b/Show More
				`@@ -0,0 +1 @@`
				`You are a Python code fixer. Return ONLY the corrected Python file. No markdown fences, no explanations — just valid Python code.`